The Structure and Assembly of Secreted Mucins* 210

Mucins are major glycoprotein components of the mucous that coats the surfaces of cells lining the respiratory, digestive, and urogenital tracts, and in some amphibia, the skin. They function to protect epithelial cells from infection, dehydration, and physical or chemical injury, as well as to aid the passage of materials through a tract. Individual organisms make several structurally different mucins, and a given mucin may be found in more than one organ (see Supplemental Material). Members of the mucin family can differ considerably in size. Some are small, containing a few hundred amino acid residues, whereas others contain several thousands of residues and are among the largest known proteins. Irrespective of size, all mucin polypeptide chains have domains rich in threonine and/or serine whose hydroxyl groups are in O-glycosidic linkage with oligosaccharides. Moreover, these domains are composed of tandemly repeated sequences that vary in number, length, and amino acid sequence from one mucin to another (1). The carbohydrate content of a mucin may account for up to 90% of its weight. There are two types of mucins, membrane-bound and secreted. Of the human mucins, two are membrane-bound (MUC1 and MUC4) (2, 3) and four are secreted (MUC2, MUC5AC, MUC5B, and MUC7) (4–7). The three other mucins (MUC3, MUC6, and MUC8) (8–11) cannot be classified. Each human mucin has a counterpart in other animals. Thus, porcine submaxillary mucin (PSM) (12), one of the most thoroughly characterized mucins, has a tissue distribution and structure similar to MUC5B. An increasing number of proteins that are not mucins also contain highly O-glycosylated domains called “mucin-like domains.” The functions of mucins are dependent on their ability to form viscous solutions or gels. Although the highly glycosylated domains of mucins are devoid of secondary structures, they are long extended structures that are much less flexible than unglycosylated random coils. The oligosaccharides contribute to this stiffness in two ways, by limiting the rotation around peptide bonds and by charge repulsion among the neighboring, negatively charged oligosaccharide groups (13). Such long, extended molecules have a much greater solution volume than native or denatured proteins with little or no carbohydrate and endow aqueous mucin solutions with a high viscosity. Mucins protect against infection by microorganisms that bind cell surface carbohydrates, and mucin genes appear to be up-regulated by substances derived from bacteria, e.g. lipopolysaccharides (14). This review will summarize what is known about the polypeptide structures of the secreted mucins and how some, in particular PSM, are assembled via interchain disulfide bonds into molecules with molecular weights in the millions. We will not consider membrane-bound mucins, which were the subject of earlier reviews (1, 15, 16).

Mucins are major glycoprotein components of the mucous that coats the surfaces of cells lining the respiratory, digestive, and urogenital tracts, and in some amphibia, the skin. They function to protect epithelial cells from infection, dehydration, and physical or chemical injury, as well as to aid the passage of materials through a tract. Individual organisms make several structurally different mucins, and a given mucin may be found in more than one organ (see Supplemental Material). Members of the mucin family can differ considerably in size. Some are small, containing a few hundred amino acid residues, whereas others contain several thousands of residues and are among the largest known proteins. Irrespective of size, all mucin polypeptide chains have domains rich in threonine and/or serine whose hydroxyl groups are in O-glycosidic linkage with oligosaccharides. Moreover, these domains are composed of tandemly repeated sequences that vary in number, length, and amino acid sequence from one mucin to another (1). The carbohydrate content of a mucin may account for up to 90% of its weight. There are two types of mucins, membrane-bound and secreted. Of the human mucins, two are membrane-bound (MUC1 and MUC4) (2,3) and four are secreted (MUC2, MUC5AC, MUC5B, and MUC7) (4 -7). The three other mucins (MUC3, MUC6, and MUC8) (8 -11) cannot be classified. Each human mucin has a counterpart in other animals. Thus, porcine submaxillary mucin (PSM) 1 (12), one of the most thoroughly characterized mucins, has a tissue distribution and structure similar to MUC5B. An increasing number of proteins that are not mucins also contain highly O-glycosylated domains called "mucin-like domains." The functions of mucins are dependent on their ability to form viscous solutions or gels. Although the highly glycosylated domains of mucins are devoid of secondary structures, they are long extended structures that are much less flexible than unglycosylated random coils. The oligosaccharides contribute to this stiffness in two ways, by limiting the rotation around peptide bonds and by charge repulsion among the neighboring, negatively charged oligosaccharide groups (13). Such long, extended molecules have a much greater solution volume than native or denatured proteins with little or no carbohydrate and endow aqueous mucin solutions with a high viscosity. Mucins protect against infection by microorganisms that bind cell surface carbohydrates, and mucin genes appear to be up-regulated by substances derived from bacteria, e.g. lipopolysaccharides (14). This review will summarize what is known about the polypeptide structures of the secreted mucins and how some, in particular PSM, are assembled via interchain disulfide bonds into molecules with molecular weights in the millions. We will not consider membrane-bound mucins, which were the subject of earlier reviews (1,15,16).

General Structural Features
Complete amino acid sequences have been described for frog (Xenopus) integumentary mucins FIM-A.1 (17) and FIM-B.1 (18), PSM (12), RSM (19), MSM (20), MUC2 (4), MUC5B (6), and MUC7 (7) and almost complete sequences for MUC5AC (5) and rat Muc2 (21). The different domains of mucins are shown in Fig. 1. Many of the domains show sequence identities and possibly similar functions in different mucins. These mucins vary greatly in size, from as few as 322 residues to 13,288 residues. The sequences of mucin polypeptides were deduced almost completely by recombinant DNA methods, and the physical-chemical properties of some mucins have not been determined. Nevertheless, it is well established that the oligosaccharides in many secreted mucins, e.g. PSM (22), show structural microheterogeneity, with GalNAc␣-O-Ser/Thr as the sugar-protein linkage upon which other sugars are added. Most mucins have negatively charged sugars, either sialic acid or O-sulfosaccharides.
Tandem Repeat Domains-The number, length, and amino acid sequence of the repeats vary among different mucins, as shown in the Supplemental Material. The tandem repeat domains are flanked on either side by other types of domains ( Fig. 1). All of the serine and threonine residues in the repeat domain of PSM have O-linked oligosaccharides (23), but this is not known for other mucins. The repeats in some mucins have identical sequences, whereas in others the repeat sequence is degenerate. The lack of secondary structures in the repeat domains and their flanking domains suggests that these domains serve as a scaffold for Olinked oligosaccharides (24), whose properties determine in large part the properties of a mucin. Light scattering and electron microscopy suggest that these glycosylated domains are semi-rigid, extended structures (13,25). The tandem repeat domains in many mucins, e.g. PSM (12) and MUC5B (6), are encoded by a single large exon, although the remainder of the mucin is encoded by short exons separated by long introns. Many mucins show length polymorphism as the result of multiple alleles that encode different numbers of tandem repeats (26). Thus, PSM is encoded by at least three alleles with 99, 110, and 135 repeats, respectively (12).
The Amino-terminal Disulfide-rich D-domains-Several disulfide-rich domains are found in secreted mucins except RSM (19), MSM (20), and MUC7 (7) and are often at either end of the polypeptide (Fig. 1). The disulfide-rich D-domain in mucins first found in VWF (27) is now recognized in many other proteins (28 -31). Many secreted mucins contain three NH 2 -terminal D-domains, designated D1, D2, and D3, and some a fourth domain, D4, at the COOH terminus ( Fig. 1). A partial D-domain, DЈ, is between D2 and D3 in all secreted mucins and VWF. Each domain, which contains up to 30 1 ⁄2Cys, shows significant sequence identity with the other Ddomains, especially the half-cystines. Comparisons of the sequences of the D-domains and other 1 ⁄2Cys-rich domains are given as supplemental information (see Supplemental Material). The D1-, D2-, and D3-domains of PSM are N-glycosylated when expressed in COS-7 cells (32), but this is not known for other mucins. In PSM and VWF all of the 1 ⁄2Cys in the D1-, D2-, D3-, and CK-domains are thought to form disulfide bonds, some of which are intrachain bonds whereas others are interchain bonds that are involved in assembly of PSM and VWF into multimers (see below).
The COOH-terminal Disulfide-rich/CK-domains-A 240 -325residue domain with 29 -33 1 ⁄2Cys is at the COOH terminus of many mucins (4 -6, 12, 18, 33, 34) (see Supplemental Material) (Fig. 1). These mucin domains have significant sequence identity with one another and with those at the carboxyl terminus of other proteins (27,28,30). Like the D-domains, they are predicted to have globular structures with ␣-helices and pleated sheets and few or no free thiols (35). They very likely contain N-linked oligosac-charides at one or more acceptor motifs (NX(S/T)) (36). The first 100 -130 residues in this domain have sequence identity with the C-domains of VWF, but the last 90 -120 residues from the COOH terminus have sequence identities with the CK-domain at the COOH terminus of VWF (27). The CK-domains are homologous to the "cystine knot" superfamily of proteins that includes transforming growth factor ␤2, nerve growth factor, platelet-derived growth factor, and chorionic gonadotropin (37). The CK-domains of VWF and mucins show significant sequence identity to norrin, a 133residue protein that in mutant form gives rise to Norrie disease in humans, a rare, sex-linked disorder characterized by congenital blindness, mental retardation, and deafness (38). The CK-domain provides the 1 ⁄2Cys that form interchain disulfide bonds between the polypeptide chains of VWF and PSM and presumably other mucins (27,35,36,39) (see below).
Other Mucin Domains-A B-domain with sequence identity to those in VWF (27) is found in several mucins (4 -6, 21) (Fig. 1). Half-cystine-rich domains other than the D-and CK-domains are noted in a few mucins (Fig. 1), and P-domains like those in the trefoil factor family (40, 41) (see Supplemental Material) are in some frog mucins. Epidermal growth factor-like domains are found in other mucins (8,42,43).

Assembly of Mucin Multimers
It is well known that the molecular weight of many mucins decreases in the presence of reducing agents (e.g. Ref. 44), suggesting that interchain disulfide bonds maintain mucins in a multimeric state. Studies on the biosynthesis of mucins in tissue explants (45) and cells in culture (46 -50) have confirmed the role of disulfide bonds in the assembly of mucins into multimers. The recognition that mucins had disulfide-rich domains structurally similar to those in VWF and the fact that VWF formed disulfide-bonded multimers through its disulfide-rich domains (27) indicated a possible role of these domains in mucin multimer formation. However, the large size of mucin polypeptides and their high carbohydrate content prevented use of the conventional methods of protein chemistry for examining the molecular details of mucin multimer formation. Fortunately it has been possible to obtain insights into multimer formation by expression of plasmids encoding mucin domains in mammalian cells followed by characterization of the recombinant proteins by SDS-gel electrophoresis and chromatography under reducing and non-reducing conditions. This approach has been particularly successful for examining multimer formation in PSM (32,35,36,51), with the assumption that the assembly of domains accurately reflects the assembly of native mucins in vivo. Thus, as illustrated in Fig. 2, PSM is thought to form disulfidelinked dimers through its COOH-terminal CK-domains, and the dimers then form disulfide-bonded multimers through their NH 2terminal D-domains. It is likely that all mucins structurally related to VWF (Fig. 1), in addition to rat Muc2, MUC5AC, BSM, CTM, PGM, and MUC6, form multimers similar to those formed by PSM.
Dimerization through the CK-domains-Two polypeptide chains of PSM form disulfide-linked dimers through their CK-domains soon after their biosynthesis in the endoplasmic reticulum (35,36). Pulse-chase studies show that dimerization is very rapid and occurs concomitant with or soon after N-glycosylation. N-Glycosylation is not required for dimer formation or later during multimer formation because both processes are unaffected by tunicamycin (32,35,36). However, unglycosylated species are poorly secreted and/or rapidly degraded after secretion into the extracellular medium (32). The fact that brefeldin A, a compound that disrupts the Golgi complex, has no effect on dimer formation and that dimers are formed before N-linked oligosaccharides become endoglycosidase H-resistant indicates that dimerization is confined to the endoplasmic reticulum. Subsequent to the studies on the dimerization of PSM, rat Muc2 was also reported to form disulfide-linked dimers through its COOH-terminal disulfide-rich domain, which includes the CK-domain (39) (see Supplemental Material). Dimer formation by other types of mucins has not been examined by expression of plasmids encoding the CK-domains. However, mucins secreted by mucin-producing cells in culture (47)(48)(49)(50), including MUC2, MUC5AC, and likely MUC5B and MUC6, appear to form disulfide-linked dimers shortly after their synthesis in the endoplasmic reticulum. In contrast to PSM, N-glycosylation is reported to be required for dimerization of rat Muc2, MUC2, and MUC5AC. The interchain disulfide bonds in PSM dimers have been examined by site-directed mutagenesis (35). Of the 11 1 ⁄2Cys in the CK-domain, mutation of 8 is without effect on dimer formation. Dimerization is partly impaired by mutation of 3 1 ⁄2Cys at residues 13223, 13244, and 13246. C 13244 and C 13246 are in the sequence C 13244 LC 13246 C, which is conserved in all mucins and other proteins containing the CK-domain (Fig. 3) (see Supplemental Material) and is also critical for interchain disulfide bond formation in VWF (27) and norrin (53). C 13223 in PSM in the sequence C 13223 VGEC is also required for efficient dimer formation, but the mutant proteins at this residue are poorly secreted (35), suggesting that this sequence motif may be important in folding of the CKdomain in the endoplasmic reticulum. This sequence motif is also conserved in all mucins, VWF, and norrin (Fig. 3) (see Supplemen- tal Material), which attests to its importance in maintaining the structure of the CK-domain.
O-Glycosylation of the Repeat Domain-The incorporation of Olinked oligosaccharides into mucins begins after N-glycosylation and disulfide-linked dimer formation as suggested by biosynthetic studies on MUC2 (49) and MUC5AC (50) and cytochemical studies of PSM (54,55). O-Glycosylation of PSM begins when the dimers reach the cis-Golgi compartments, because the GalNAc transferase that forms the GalNAc-Ser/Thr linkages and the mucin precursors bearing only GalNAc have been located by electron microscopy in the cis-Golgi in mucous cells of submaxillary glands (54,55). Moreover, unglycosylated mucin precursors are detected only in the lumen of the endoplasmic reticulum (55). Other cells expressing secreted mucins, including intestinal goblet cells (56), also appear to initiate O-glycosylation in the cis-Golgi although in certain mucin-producing cell lines O-glycosylation is found to begin in the endoplasmic reticulum (46,57). The completion of the biosynthesis of the O-linked oligosaccharides in secreted mucins continues in the medial-and trans-Golgi compartments where the requisite glycosyltransferases for elongation and termination of the oligosaccharides are located (58).
Multimerization through the D-Domains-Expression in COS-7 cells of plasmids encoding the three D-domains of PSM has shown that these domains participate in formation of interchain disulfide bonds between disulfide-linked dimers to give very high molecular weight multimers of mucin (32). Multimer formation differs from dimer formation in several respects. Brefeldin A, which disrupts the Golgi complex, inhibits multimer formation, indicating that multimers form in the Golgi complex. Compounds that increase the pH of the trans-Golgi compartments, such as chloroquine and mon-ensin, also inhibit multimer formation but not dimerization (32). Bafilomycin, a specific inhibitor of the vacuolar H ϩ -ATPase that maintains the trans-Golgi compartments at a slightly acidic pH, also inhibits multimer formation. These observations suggest that the interchain disulfide bonds that give rise to multimers are formed at a slightly acidic pH in the trans-Golgi complex through 1 ⁄2Cys residues in the D-domains. The molecular weights of the multimers cannot be assessed accurately by SDS-gel electrophoresis because they are so large they do not enter the running gel under non-reducing conditions. However, species with a size of trimers were observed when the three D-domains were expressed together (32), suggesting that a step in the process of multimerization is trimer formation of disulfide-linked dimers. Such multimers are likely branched structures as indicated in Fig. 2. Recombinant PSM containing no glycosylated domains is secreted from COS-7 cells as dimers and multimers and indicates that like VWF not all dimers are converted to multimers (32). C 1199 in the D3-domain of PSM has been found by site-directed mutagenesis studies to be a likely candidate for forming one of the interchain disulfide bonds in mucin multimers (51). It is in the sequence C 1199 SWRYEPCG, which is highly conserved in secreted mucins (Fig. 3) (see Supplemental Material), and the analogous 1 ⁄2Cys in VWF is required for its multimerization (27). In contrast, C 1276 , which is suggested to form interchain disulfide bonds in VWF (27), does not form such bonds in PSM (51). The other half-cystines in the D-domains that form interchain disulfide bonds are not known.
VWF multimers are formed from prepro-VWF, which is cleaved intracellularly by a subtilisin-like protease (furin) at R 763 in the sequence motif R 760 SKR 763 in the DЈ-domain (27). The released propeptide contains the D1-and D2-domains and is essential for multimer formation although cleavage is not. Cleavage may not be essential for mucin multimerization because the DЈ-domains of mucins do not contain the sequence motif required for proteolytic cleavage of prepro-VWF. The observation that the D-domains of PSM are not cleaved when expressed in COS-7 or MOP-8 cells (32) is consistent with the lack of the cleavage motif in the DЈ-domain of PSM (see Supplemental Material). However, some proteolytic processing of mucins is possible as suggested by recent studies showing that cleavage occurs in the COOH-terminal region of MUC2 (59), MUC5B (60), and rat Muc2 (61), although a role for such cleavages in mucin assembly is unknown. Moreover, proteolytic cleavage during purification of the mucins was not ruled out.
The prediction that PSM contains branched multimers indicates that branches should be observed on electron microscopy of mucins. It is generally argued that mucins form linear polymers (e.g. see Ref. 45), which has been substantiated by electron microscopy (24). Moreover, VWF forms dimers through its CK-domains, and the dimers form linear multimers through their D-domains (27). It is quite possible that some mucins with disulfide-rich domains do not form multimers in the manner proposed for PSM. However, mucins are highly susceptible to proteolysis during isolation (62,63), and further electron microscopic studies should be made on well characterized preparations. Of interest is a recent report describing branched structures for MUC5B in respiratory secretions of asthmatic individuals (64). Nevertheless, additional mechanisms of mucin assembly are supported by studies on MUC2. LS174T cells synthesize soluble MUC2 disulfide-linked dimers, but higher molecular weight species are water-insoluble (65). Apparently, the water-insoluble species are assembled in the Golgi complex following initial O-glycosylation by a pH-independent process. These insoluble complexes are partly maintained by non-reducible chemical bonds of unknown nature (65). It is not known whether the complexes are secreted, although MUC2 in the intestine is thought to be part of an insoluble glycoprotein complex (59). Other studies suggest that MUC2 is assembled into large soluble, disulfidelinked oligomers/multimers (47,66). Clearly, MUC5AC (67) and MUC5B (60) are large soluble, gel-forming mucins stabilized by disulfide bonds.
As described above, the assembly of PSM and VWF (27) involves dimerization in the endoplasmic reticulum and multimerization in the trans-Golgi compartments. The molecular mechanisms that permit this compartmentalization are not known, but the NH 2terminal D-domains and the CGLCG motifs in the D1-and D3domains seem to play critical roles (51). Plasmids encoding only the D1-and D2-domains, the D1-and the D3-domains, or the D3domain of PSM expressed mucin oligomers in the presence of monensin suggesting that the three domains must be contiguous to avoid multimerization at the non-acidic pH of the endoplasmic reticulum and the cis-and medial-Golgi compartments. Replacement of the two 1 ⁄2Cys by alanine in the CGLCG motif in the D3-domain permits formation of multimers in the presence of monensin (51). Thus, the motif in the D3-domain prevents multimerization of mucin in the non-acidic compartments of the endoplasmic reticulum and the cis/medial-Golgi compartments. Replacement of the two 1 ⁄2Cys by alanine in the CGLCG motif in the D1-domain dramatically reduces the rate of formation of disulfide-linked multimers (51). This observation suggests that multimerization at low pH in the acidic trans-Golgi compartments requires the motif in the D1-domain. Multimerization of VWF also requires the CGLCG motif in the D1-domain (27), although a role for the motif in the D3-domain has not been reported. VWF has another CGLCG motif in the D2-domain that is also required for its assembly (27). However, among the mucins structurally related to VWF, only MUC5AC and MUC5B have CGLCG motifs in their D2-domains (see Supplemental Material). The exact roles of the CGLCG motifs remain unknown but because of the fact that similar motifs are in the active sites of proteins involved in catalyzing formation of disulfide bonds during protein folding, such as protein disulfide isomerase (Fig. 3), the question arises whether these motifs have a direct role in formation of disulfide bonds in mucins.

Concluding Remarks
Much progress has been made recently in our understanding of the structure and assembly of secretory mucins, but much work remains for the future. Other members of the mucin family should be identified and their structures and mechanism of assembly into disulfide-bonded multimers elucidated. The pairing of half-cystines to form the many disulfide bonds in the globular domains must be established, and the role of chaperones in folding of these domains must also be determined. The molecular basis for the regulated/ polarized transport of mucins should be explored. These kinds of studies will be needed to obtain further insights into the exact biological roles of mucins.