Similarities of integumentary mucin B.1 from Xenopus laevis and prepro-von Willebrand factor at their amino-terminal regions.

Frog integumentary mucin B.1 (FIM-B.1) contains various cysteine-rich modules. In the past, a COOH-terminal “cystine knot” motif has been found that is similar to von Willebrand factor; this region is generally known to be responsible for dimerization processes. Furthermore, a “complement control protein” motif is present as an internal cysteine-rich domain in FIM-B.1. We characterize here the missing 75% toward the NH2 terminus of the FIM-B.1 precursor by molecular cloning. Analogous to prepro-von Willebrand factor, four elements with considerable similarity to D-domains are present (i.e. D1-D2-D′-D3). These domains have been described as essential for the multimerization of von Willebrand factor. Thus, the general structure of FIM-B.1 resembles that of the human mucin MUC2 as well as prepro-von Willebrand factor; these three molecules at least seem to share common structural elements allowing similar multimerization mechanisms.

Frog integumentary mucin B.1 (FIM-B.1) contains various cysteine-rich modules. In the past, a COOH-terminal "cystine knot" motif has been found that is similar to von Willebrand factor; this region is generally known to be responsible for dimerization processes. Furthermore, a "complement control protein" motif is present as an internal cysteine-rich domain in FIM-B.1. We characterize here the missing 75% toward the NH 2 terminus of the FIM-B.1 precursor by molecular cloning. Analogous to prepro-von Willebrand factor, four elements with considerable similarity to D-domains are present (i.e. D1-D2-D-D3). These domains have been described as essential for the multimerization of von Willebrand factor. Thus, the general structure of FIM-B.1 resembles that of the human mucin MUC2 as well as prepro-von Willebrand factor; these three molecules at least seem to share common structural elements allowing similar multimerization mechanisms.
During phylogeny, mucus gels have been conserved as the essential extracellular matrices that protect delicate epithelial surfaces in many ways (1,2). Mucins have been established as the molecules that primarily determine the defined rheological and viscoelastic properties of these gels. The key step in the formation of such a three-dimensional complex network is the ordered aggregation of linear rodlike monomeric mucins (3). The stiff and extended conformation of the monomers is the result of highly O-glycosylated repetitive serine/threonine-rich regions (4). In contrast, aggregation to multimers is achieved via cysteine-rich modules. Two models would describe such a network: (i) a cross-linked network model and (ii) an entangled network model. However, only the latter fulfills the physicochemical criteria that define a dynamic mucus gel (5).
Due to technical problems, complete molecular structures of mucins are still rare. For example, more than seven human MUC genes (6), bovine and porcine salivary mucins (7,8) as well as three frog integumentary mucins (FIM-A.1, 1 FIM-B.1, FIM-C.1) have been at least partially characterized (9). The latter represent typical extracellular mosaic proteins with astonishing structural similarities to other peptides and proteins (10). FIM-B.1 from Xenopus laevis certainly shows the most interesting molecular architecture. However, only about 25% from the COOH-terminal portion of the sequence has been reported thus far. In addition to a variable number of O-glycosylated type B repeats responsible for polydispersities (11), FIM-B.1 contains at least two different cysteine-rich modules: (i) internally, the "complement control protein motif" (CP, also known as "Shushi structure" or "short consensus repeat" (SCR) (12)) and (ii) a COOH-terminal region with homology to von Willebrand factor (vWF) (13). In vWF, this part is responsible for dimerization (14). A motif spanning 11 cysteine residues named "cystine knot" has been proposed as the active site of the latter (15) and is responsible for dimerization of certain cytokines as well (16). Subsequently, the cystine knot motif has also been found in a variety of other mucins, e.g. bovine salivary mucin (7), porcine salivary mucin (8), MUC2 (17), rMUC2/ rMLP (18), and MUC5 (19) as well as the human sublingual gland mucin MG1 (20). Recently, dimerization of MUC2 has been reported (21), and for porcine salivary mucin it has been clearly shown that it forms dimers via its COOH-terminal domain (22). Thus, this motif is now considered to trigger homodimerization as an early event in the biosynthesis of many mucins.
We report here the full-length sequence of the FIM-B.1 precursor starting with the signal sequence as deduced from cDNA cloning.

EXPERIMENTAL PROCEDURES
Isolation of mRNA from the skin of a single adult X. laevis (purchased from the Herpetological Institute, Dr. W. de Rover, Belgium), cyclic thermal amplification via the polymerase chain reaction (PCR), and purification and sequencing of plasmid DNA as well as computerized analysis and homology searches have been described previously (12).
In order to elongate the incompletely known nucleotide sequence encoding the COOH-terminal portion of FIM-B.1 toward the 5Ј-end, a multistep amplification procedure (RACE protocol) has been employed (23). Starting from the region encoding the CP motif (12), the oligonucleotide SCR1 d(CACAGCTTGGTGTATTTC) was used as a specific primer for cDNA synthesis. After dC tailing, amplification occurred with Taq polymerase and a combination of oligonucleotides REP7 d(C-CCTCGAGAATTCGGATCCTGCTACCGTTCCGTTT) and PCR5Ј d(C-CGGATCCTCGAGAATTCTAGA(G) 14 ). The underlined region is complementary to part of the CP motif in FIM-B.1 (12). After subcloning the products into the BamHI/EcoRI sites of pBluescript-II/SK Ϫ (Stratagene), clone pS5R7-2 was obtained. Further cDNA clones were generated in a similar way by a multistep amplification procedure using a set of specific primers toward the 5Ј-end (Fig. 1C).
Based on this sequence information obtained from relatively short cDNA clones, long overlapping cDNA clones were generated by PCR and subsequently analyzed (Fig. 1B). Fig. 2 represents the cDNA sequence obtained from a set of overlapping clones using the RACE protocol toward the very 5Ј-end. The deduced amino acid sequence encodes the aminoterminal portion of the FIM-B.1 precursor, starting with the * Financial support was received from the "Fonds der Chemischen Industrie." The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

RESULTS
The nucleotide sequence(s) reported in this paper has been submitted to the GenBank TM /EBI Data Bank with accession number(s) Y08296.
¶ To whom correspondence should be addressed: Institut fü r Molekularbiologie und Medizinische Chemie, Universitä tsklinikum, Leipziger Str. 44, D-39120 Magdeburg, Germany. Fax: 49-391-67-13-096. 1 The abbreviations used are: FIM, frog integumentary mucin; PCR, polymerase chain reaction; vWF, von Willebrand factor; RACE, rapid amplification of cDNA ends; SCR, short consensus repeat; CP, complement control protein motif. signal sequence until it reaches the CP motif (which served as the anchor for the first specific oligonucleotide used). However, the CP motif cloned here does not show the identical sequence as characterized previously (12). The two CP motifs differ in precisely two point mutations also changing two amino acid residues: K to E and A to G. These two mutations have been confirmed to be highly specific by the analysis of a series of independent cDNA clones. In order to distinguish between these two CP motifs, we designated them SCR (12) and SCR * (sequence from Fig. 2). Thus, we assumed that the FIM-B.1 precursor could theoretically contain at least two CP motifs that differ slightly.
To test this hypothesis, oligo(dT)-primed cDNA from X. laevis skin was amplified with Taq polymerase using the oligonucleotides FIM8 d(CCCGGATCCTCGAGAATTCAAATCAAGC-TATAACAG) and SCR4 d(CCCGGATCCGCACAACCTCCCTT-TTT). The underlined part in FIM8 represents positions 3979 -3995 from Fig. 2, and SCR4 is complementary to the SCR motif (12) and does not recognize SCR * . After subcloning the PCR products into the BamHI/EcoRI sites of pBluescript--II/SK Ϫ , clones pF8S4.2-5, -7, and -8 were characterized (Fig.  3). All three clones indeed contained two different CP motifs (i.e. SCR * and SCR). However, the clones were not identical but differed in their repetitive parts by specific insertions and deletions. Such polydispersities are typical of FIM-B.1 and have been shown to result from alternative splicing of repetitive cassettes (11). DISCUSSION The combined amino acid sequences deduced in Figs. 2 and 3 now complete the missing amino-terminal portion of FIM-B.1. Together with the published COOH-terminal part (12,13), the FIM-B.1 precursor consists of at least about 2700 amino acid residues (Fig. 4) encoded by a polydisperse mRNA population with a length of more than 8.3 kilobases. This is in fairly good agreement with Northern blot analysis which revealed a smear of up to 10 kilobases (24). As indicated in Fig. 4, the difference is probably due to the existence of a polydisperse cluster of additional CP motifs and repetitive highly O-glycosylated regions (25). In particular, multiple CP motifs could represent potential anchor points that non-covalently cross-link mucin subunits.
The amino-terminal portion of the FIM-B.1 precursor presented here can be clearly divided into separated domains. As is typical of secretory proteins, the sequence starts with a hydrophobic signal sequence that is probably cleaved off after alanine 19 (Fig. 2). Then a mainly basic repetitive region follows with the motif PAKGG. For this glycine-rich (until gly- cine-77) sequence a ␤-turn structure can primarily be expected. Similar terminal sequences have been detected in cytokeratins (26) and synapsins (27). Starting with proline 78, the pattern changes drastically to a threonine-rich sequence also containing proline and alanine. Such a composition is typical of mucins (2,4); however, the acid residues flanking some threonine residues at positions Ϫ1 probably diminish their potential to become O-glycosylated (28). Similarly, as shown previously for type B repeats (12), analysis of further cDNA clones revealed polydispersities by insertion of a variable number of tandem repeats with the motif PAATDSET after amino acid 122 (25). Thus, the sequence given in Fig. 2 represents a minimal length variant within a polydisperse population.
Certainly one of the most interesting domains in FIM-B.1 is the cysteine-rich region between positions 172 and 1330 (Fig. 2) because it reveals pronounced similarities with pro-vWF (29). In particular, three subdomains with internal homology (named D1, D2, and D3) as well as a truncated version located between D2 and D3 (designated as DЈ) can be recognized. This set of D-domains has been reported to be obligatory for multimer assembly of pro-vWF (30). This biosynthetic event occurs unusually late in trans-Golgi and post-Golgi acidic compartments (30) and seems to be independent of dimerization in the endoplasmic reticulum (31). Furthermore, multimerization via the D1 and D2 domains plays an important role in storage granule formation (32). Small vWF multimers are secreted constitutively, whereas large multimers are packed into Weibel-Palade bodies and then released via the regulated pathway (30). An analogous domain structure (as in vWF and FIM-B.1) has also been reported for the amino-terminal part of MUC2  Fig. 1B, which were all obtained from a single individual. Two point mutations were observed in clone pF18F38 -3 in its overlapping region with pF30F56 -4. The T at position 1950 is changed to C (Ile 3 Thr mutation on amino acid level) and at position 2147 the G is changed to A (Asp 3 Asn mutation). Restriction sites are marked. The presumed signal sequence of the precursor, as well as potential N-glycosylation sites, is underlined. Also denoted are the cysteine-rich domains homologous to vWF (D1-D2-DЈ-D3) and the CP motif. (33) (which also forms multimers (34)). As shown in Fig. 5, nearly all cysteine residues are conserved in these three molecules. However, the general similarity of the sequences is not particularly pronounced. The two most conserved continuous stretches of amino acid residues are regions in the D1 and the D3 domain with the sequences TCGLCG and VCGLCGN, respectively. Remarkably, the vicinal cysteine residues in the underlined CGLCG motifs are similar to those at the active site of disulfide isomerase, and they have been proposed to play a role in multimerization of pro-vWF (35). In the mature vWF (after cleavage of its pro-sequence (i.e. at the D2/DЈ junction; see  pro-peptide. This sequence is close to the equivalent position in pro-vWF and also remarkably resembles the known processing site in the vWF precursor (sequence RSKR2S; Fig. 5). It is noteworthy that proteolytic cleavage of pro-vWF is not essential for multimer formation (38). Taken together, many mucins seem to mimic the covalent stepwise aggregation of vWF to linear clusters. Molecular structures supporting such a model are now available for MUC2 (33), rMUC2 (39), FIM-B.1, and obviously also porcine salivary mucin (22). Furthermore, partial sequences of MUC5 (19), bovine salivary mucin (7), and MG1 (20) indicate that these mucins could follow the same common hypothetical scheme. Also, the sperm membrane protein zonadhesin (40) containing a mucin-like domain and a cluster of D-domains would be a candidate for a similar molecular mechanism. However, based on the observation that vWF D-domains bind heparin (41), non-covalent interactions of mucin D-domains with sulfated carbohydrates should also be taken into consideration. Such a lectin bond-mediated polymerization model has already been proposed in the past for mucus gels (42).
Acknowledgments-We thank U. Schimanko and C. Cap for oligonucleotide synthesis.  Fig. 2) is compared with the D1-D2-DЈ-D3 domains of prepro-vWF (29) and the amino-terminal part of MUC2 (33). Gaps are introduced to maximize similarity. Identical amino acid residues in prepro-vWF and MUC2 (when compared with FIM-B.1) are enclosed in boxes. The cleavage site in prepro-vWF is indicated by an arrow, as well as a potential processing site in FIM-B.1. Triangles indicate cysteine residues probably involved in homophilic intermolecular disulfide bridges.