The Cadherin Cytoplasmic Domain Is Unstructured in the Absence of β-Catenin

Cadherins are single pass transmembrane proteins that mediate Ca2+-dependent homophilic cell-cell adhesion by linking the cytoskeletons of adjacent cells. In adherens junctions, the cytoplasmic domain of cadherins bind to β-catenin, which in turn binds to the actin-associated protein α-catenin. The physical properties of the E-cadherin cytoplasmic domain and its interactions with β-catenin have been investigated. Proteolytic sensitivity, tryptophan fluorescence, circular dichroism, and 1H NMR measurements indicate that murine E-cadherin cytoplasmic domain is unstructured. Upon binding to β-catenin, the domain becomes resistant to proteolysis, suggesting that it structures upon binding. Cadherin-β-catenin complex stability is modestly dependent on ionic strength, indicating that, contrary to previous proposals, the interaction is not dominated by electrostatics. Comparison of 18 cadherin sequences indicates that their cytoplasmic domains are unlikely to be structured in isolation. This analysis also reveals the presence of PEST sequences, motifs associated with ubiquitin/proteosome degradation, that overlap the previously identified β-catenin-binding site. It is proposed that binding of cadherins to β-catenin prevents recognition of degradation signals that are exposed in the unstructured cadherin cytoplasmic domain, favoring a cell surface population of catenin-bound cadherins capable of participating in cell adhesion.

The formation and maintenance of solid tissues depends upon specific and regulated intercellular adhesion (1). Cadherins are single pass transmembrane adhesion proteins that link the cytoskeletons of adjacent cells in two kinds of intercellular junctions: the adherens junction and the desmosome. These structures play a critical role in tissue development, including cell segregation, condensation, polarization, and differentiation. Cadherin-mediated linkage of cytoskeletal networks imparts resistance to mechanical stress and enables concerted motions required by morphogenic processes. Defects in cadherin-mediated adhesion are associated with several characteristics of malignant transformation, such as dedifferentiation, high mobility, and invasive growth (2,3).
Adherens junctions are sites of cell-cell contact that link the actin cytoskeletons of adjacent cells (4). Cadherin extracellular domains on opposing membranes mediate specific Ca 2ϩdependent, homotypic interactions. The cytoplasmic domains bind to ␤-catenin, which in turn binds to the actin-associated protein ␣-catenin (4,5). An analogous adhesion system exists in Drosophila, with armadillo and DE-cadherin the orthologues of ␤-catenin and E-cadherin, respectively (6). Deletion mutagenesis studies have mapped the regions of E-cadherin and ␤-catenin required for association. The C-terminal 72 residues of E-cadherin are necessary and sufficient for ␤-catenin binding (7), and a 30-amino acid stretch within this region has been proposed to be the "core" ␤-catenin binding sequence (8).
The primary structure of ␤-catenin consists of an N-terminal region of 140 amino acids, followed by a 524-residue domain that contains 12 repeats of 42 amino acids known as armadillo (arm) repeats (9) and a 119-residue C-terminal tail. The arm repeat domain is required for association with cadherins (10,11). The three-dimensional structure of the arm repeat domain showed that each arm repeat comprises three helices, with the repeats packing to form a superhelix of helices (12). The superhelix features a shallow groove with a positively charged surface potential. The core ␤-catenin-binding region of E-cadherin, which has a calculated pI of 3.3, was proposed to bind within the positively charged groove presented by the ␤-catenin armadillo domain (12).
Several mechanisms appear to modulate cadherin-based adhesion. Cadherins associate with ␤-catenin shortly after biosynthesis, while still in the endoplasmic reticulum, and the two proteins move together to the cell surface, where they associate with ␣-catenin (13). Failure of cadherins to associate with ␤-catenin leads to retention in the endoplasmic reticulum and degradation of cadherin (14). Adherens junction formation is also affected by phosphorylation (15,16). For example, phosphorylation of serines in the the cadherin cytoplasmic tail by casein kinase II and glycogen synthase kinase-3␤ kinases increases the affinity of cadherin for ␤-catenin (16). Moreover, adherens junctions are enriched in protein-tyrosine kinases and phosphatases, some of which bind the cadherin-catenin complex directly (17)(18)(19)(20). Tyrosine kinases target several adherens junction components that could modulate junctional stability, including ␤-catenin (21) and the arm repeat protein p120 ctn (22)(23)(24), which binds the cadherin cytoplasmic domain at a site distinct from ␤-catenin (25,26).
␤-Catenin also plays a central role in the Wnt/Wg growth factor signaling pathway that controls cell fate determination during embryogenesis (reviewed in Ref. 27). In this role ␤-catenin acts as a transcriptional coactivator when bound to members of the lymphoid enhancer factor/T-cell factor (Lef/Tcf) transcription factor family (28,29). Wnt signaling activates transcription by blocking or slowing the normally rapid turnover of ␤-catenin, thereby elevating cytosolic levels of ␤-catenin and promoting formation of an active ␤-catenin-transcription factor complex (27). The cytosolic concentration of ␤-catenin is normally maintained below the signaling threshold by a multiprotein complex that targets ␤-catenin for ubiquitination and proteasomal degradation. This protein complex contains APC, the product of the adenomatous polyposis coli gene, the serine/ threonine kinase GSK3␤, and Axin. The ␤-catenin binding sequences in LEF-1 and APC are largely electronegative, and recent mutagenesis studies have indicated that they bind within the positively charged groove formed by the armadillo repeat region of ␤-catenin (30). It is known that cadherin, APC, and LEF-1 compete for binding to the arm repeat domain of ␤-catenin (11,30), and it is likely that all three ligands bind to overlapping portions of the positively charged groove. Thus, depending on which ligand is bound, ␤-catenin can have one of several distinct fates: adhesion complex component, transcriptional coactivator, or substrate for proteasomal degradation. As a first step toward understanding the cadherin-␤-catenin interaction, we have used a variety of biophysical and biochemical techniques to characterize the cadherin cytoplasmic tail in the absence and presence of ␤-catenin. Our analysis shows that the cadherin cytoplasmic tail is unstructured in isolation but appears to become structured upon binding to ␤-catenin. Comparison of a variety of cadherin sequences reveals the presence of sequence motifs associated with proteosomal degradation. It is proposed that these properties are associated with the regulation of cadherin turnover and cellular adhesiveness.

EXPERIMENTAL PROCEDURES
Protein Expression and Purification-Murine ␤-catenin and the cytoplasmic domain of murine E-cadherin (E cyto , Arg 580 -Asp 728 ) 1 and Drosophila DE-cadherin (DE cyto , Gln 1350 -Ile 1507 ) were expressed in Escherichia coli as C-terminal fusions to glutathione S-transferase (GST). Expression constructs were designed to leave a minimal number of additional residues (Gly-Ser-Pro for ␤-catenin and Gly-Ser for the cadherins) at the N termini after cleavage of the GST fusion. No additional residues were added to the C termini. The bacterially expressed, recombinant cadherin cytoplasmic domains are designated rE cyto or rDE cyto .
␤-Catenin cDNA in pBluescript SKIIϩ (31) was digested with NdeI, and the resulting overhangs were filled with DNA polymerase I large (Klenow) fragment. Further digestion with SalI yielded the desired insert, which was ligated into SmaI-SalI-digested pGEX-KG (32). The sequence encoding the DE-cadherin cytoplasmic domain was amplified by polymerase chain reaction from a Drosophila head ZAP® II cDNA library (Stratagene, La Jolla, CA) using primers 5Ј-GCGCCCGGATC-CCAGAAGAAGCAGAAGAAT-3Ј and 5Ј-GCGCCCGATTTCTTTAGAT-GCGCCAGCCCTGGTC-3Ј. The resulting fragment was digested with EcoRI and BamHI and ligated into EcoRI-BamHI-digested pGEX-KG. The identity of the polymerase chain reaction product was verified by DNA sequencing and its ability to specifically bind armadillo and ␤-catenin (data not shown). The GST-E-cadherin cytoplasmic domain fusion construct has already been described (33).
Transformed cells were grown in Super Broth, induced with 0.2 mM isopropyl-1-thio-␤-D-galactopyranoside and harvested by centrifugation 3 h after induction, and the cell paste was stored at Ϫ70°C. Thawed cell paste was treated with protease inhibitors (2 mM phenylmethylsulfonyl fluoride, 2 mg/ml aprotinin, and 4 mg/ml pepstatin-A) and deoxyribo-nuclease I, and the cells were lysed in a French pressure cell. Fusion proteins were batch affinity purified from lysates with glutathioneagarose beads (Sigma). GST-␤-catenin fusion protein was obtained by eluting the protein with a buffer containing 50 mM reduced glutathione. E-and DE-cadherin cytoplasmic domains and ␤-catenin were obtained by cleaving the glutathione-agarose-bound fusion proteins with bovine thrombin (Sigma). Thrombin can cleave ␤-catenin internally at Arg 90 or Arg 95 . Thus, digestion of the GST-␤-catenin fusion yields full-length ␤-catenin and two other major species. These fragments, which differ only at their N termini, copurify and are collectively named ␤76. Anion exchange and size exclusion chromatography were used to purify all protein products to near homogeneity.
Cadherin-␤-Catenin Stoichiometry Experiments-Mixtures of rE cyto or rDE cyto and various ␤-catenin constructs were incubated for more than 1 h at 4°C and injected onto an Amersham Pharmacia Biotech HR 10/30 Superdex 200 size exclusion column equilibrated with 50 mM Tris-HCl, pH 8, 200 mM NaCl, 20 mM EDTA, and 1 mM DTT.
Endogenous versus Recombinant E-Cadherin-␤-Catenin Binding Studies-MDCK cells (type II J) were maintained in Dulbecco's modified Eagle's medium supplemented with 10% fetal bovine serum. Rabbit polyclonal antibody for the E-cadherin cytoplasmic domain was described previously (33). Mouse monoclonal antibodies for E-cadherin and ␤-catenin were purchased from Transduction Laboratories. Polyclonal anti-E-cadherin and preimmune sera were covalently coupled to protein A-Sepharose. In each case, a ratio of 10 ml of serum to 75 ml of a 50% slurry of protein A-Sepharose was incubated for 12 h at 4°C. The Sepharose was washed with 0.2 M borate, pH 9.0, and the antibodies were cross-linked to the protein A with dimethylpimelimidate (Pierce).
Precleared extract supernatant was incubated for 2 h at 4°C with Sepharose-immobilized polyclonal E-cadherin antibody. Postincubation Sepharose was washed once with DTEB and twice with NaCl-supplemented DTEB. All washes were for 10 min at 4°C and 10 different NaCl concentrations were used. Sepharose beads were resuspended in DTEB, centrifuged through a 1 M sucrose pad, washed with additional DTEB, and boiled for 5 min in SDS sample buffer. After separation on 6.5% SDS-PAGE gels, immunoprecipitated proteins were transferred to Immobilon-P polyvinylidene fluoride membrane (Millipore Corp.) and detected with murine monoclonal primary antibodies specific for E-cadherin and ␤-catenin and 0.1 mCi/ml 125 I-labeled goat anti-mouse secondary antibody (ICN Pharmaceuticals, Irvine, CA). Immunoblots were exposed to x-ray film (X-Omat AR; Eastman Kodak Co.) and quantified using a Molecular Dynamics Storm 820 PhosphorImager system.
To compare the behavior of recombinant ␤-catenin-E-cadherin complexes with that of endogenous complex, 1:1 stoichiometric mixtures of the recombinant proteins were incubated on ice for 16 h and mock precleared as described above. Immunoprecipitations and washes were carried out as described above. Immunoprecipitates were separated on 12% SDS-PAGE gels and processed for immunoblotting and quantitation as described above.
Circular Dichroism-Circular dichroism data were measured using an Aviv 60DS spectropolarimeter equipped with a Peltier temperature control unit (Hewlett-Packard 89100A). The spectropolarimeter was calibrated with (ϩ)-10-camphorsulfonic acid and a 1-mm path length was used for all experiments. Spectra of E-and DE-cadherin cytoplasmic domains were measured in 10 mM phosphate, pH 7, at 0°C. These spectra were measured in two parts, with 25 and 10 M samples being used for 280 -205-and 220 -186-nm wavelength ranges, respectively. The two halves of each spectrum were scaled using the known sample concentrations.
NMR Measurements-NMR spectra were acquired using a General Electric GN-Omega instrument operating at 500 MHz and a 680 M sample of E-cadherin cytoplasmic domain in 5 mM Tris-HCl, pH 8, 10 mM NaCl, 0.35 mM trimethyl silyl propionate, and 50% D 2 O. Presaturation was used to suppress the H 2 O peak, and trimethyl silyl propionate was employed as the chemical shift standard. The sample was shimmed briefly before acquisition of one-dimensional 1 H NMR spectra of 256 scans each (4096 real points; spectral width, 7000 Hz) at 50, 25, and 3°C. NMR data were processed using Felix, version 2.30, from Biosym Technologies (San Diego, CA). The free induction decay was processed conservatively by Fourier transformation without premultiplication. Following phasing, the base lines were corrected by fitting to a zero order polynomial function.
Poly-L-glutamate versus E-Cadherin Binding Competition-GST-␤catenin fusion protein and glutathione-agarose were mixed for 45 min at room temperature in a binding buffer comprising 200 mM Tris-HCl, pH 8.5, 2 mM DTT, and either 100, 200, or 400 mM NaCl. Poly-Lglutamate (P1818 or P4636, with average molecular masses of roughly 1,000 and 11,000 Da, Sigma) was then added, and the mixtures were incubated an additional 45 min. E-cadherin was added, and after another 45-min incubation, the agarose beads were spun down, washed three times with the appropriate binding buffer, and boiled in reducing SDS-PAGE sample buffer. The sample supernatants were analyzed by SDS-PAGE. In each case, after the addition of E-cadherin, the incubation mixture was 3 M in GST-␤-catenin, 3 M in E-cadherin cytoplasmic domain, and 0, 3, 9, or 27 mM in poly-L-glutamate. Different Ecadherin and poly-L-glutamate stocks were made using the appropriate binding buffers.
Limited Proteolysis of Cadherin-␤-Catenin Complexes-Approximately 25 M recombinant E-cadherin cytoplasmic domain (rE cyto ) alone or mixed in 1:1 stoichiometric amounts with either full-length ␤-catenin or the armadillo repeat region (␤59) (12) was subjected to limited proteolysis with subtilisin. A control mixture containing 25 M rE cyto and 30 M bovine serum albumin was also digested. Mixtures were incubated at 4°C for 1 h and digested for 20 min at room temperature with subtilisin concentrations ranging from 0.08 to 1.3 g/ml. Digestions were carried out in a buffered solution comprising 100 mM CHES, pH 9.2, 2 mM CaCl 2 , and 5 mM DTT. This pH is suboptimal for subtilisin but is required for ␤59 solubility. In separate experiments, rE cyto -␤-catenin complex was digested with endoproteinase Glu-C using concentrations ranging from 0.4 to 27 g/ml; these reactions were carried out in 50 -100 mM Tris-HCl, pH 8.5, 2 mM CaCl 2 , and 5 mM DTT. rDE cyto -␤-catenin complex was also digested with subtilisin as described for rE cyto . Subtilisin digestions were stopped by adding phenylmethylsulfonyl fluoride to a final concentration of 8 mM, and endoproteinase Glu-C digestions were stopped by boiling.

Recombinant Cadherin Cytoplasmic Domains Are Not
Folded in Isolation-Fluorescence, circular dichroism, and proton NMR were used to characterize the folded state of recombinant E-cadherin and DE-cadherin cytoplasmic domains (rE cyto and rDE cyto respectively). E cyto contains a single tryptophan that is located near the C terminus of the protein. DE cyto has two tryptophans that are near the N and C termini, the latter in a position distinct from the tryptophan in E cyto . In a folded protein, the tryptophan indole ring is frequently buried within a hydrophobic core or is otherwise shielded from solvent, resulting in a blue shift of the fluorescence maximum relative to free tryptophan (36). The tryptophan emission maximum for ␤76, a thrombin-generated fragment of ␤-catenin (see "Experimental Procedures"), is significantly blue-shifted relative to tryptophan alone, whereas the maxima for E cyto and DE cyto are not (Table I). Thus, the single E cyto tryptophan and two DE cyto tryptophans appear to be solvent exposed.
Folded and denatured proteins commonly display significantly different fluorescence anisotropy values because of the loss of rotational freedom that occurs when a tryptophan indole ring is buried in a hydrophobic core. ␤-Catenin and BSA yield anisotropy values of ϳ0.085 when folded and ϳ0.035 when denatured with 6 M guanidine HCl (Table I). In contrast, recombinant E-and DE-cadherin cytoplasmic domains have anisotropy values comparable with those observed with dena-  (1) a Fluorescence maximum was between 300 and 420 nm. b Anisotropy (r) is defined as (I par Ϫ I per )/(I par ϩ 2I per ), where I par is the intensity of emitted light with the same orientation as the vertically polarized incident light, and I per is the intensity of emitted light with an orientation perpendicular to the incident light. Each measurement represents the mean anisotropy value derived from eight measurements of a given sample. The standard deviation for a given measurement was typically 0.001-0.002. The number of independent measurements for each sample is given in parentheses.
tured proteins, and these values do not change significantly when the cadherins are subjected to denaturants ( Table I).
The fluoresence anisotropy data suggest that the E cyto and DE cyto domains are unfolded under native conditions. However, regions of a folded protein that lack structure, such as large loops or unstructured N and C termini, might give similar results. CD spectroscopy was therefore used to probe for the presence regular protein secondary structure. The spectra measured for rE cyto and rDE cyto at 0°C are essentially identical and feature a single minimum in mean residue ellipticity at ϳ202 nm (Fig. 1). This spectrum indicates a lack of secondary structure, which would be expected of an unstructured polypeptide (37,38).
To eliminate the possibility that the cytoplasmic tail has a defined conformation without regular secondary structure, one-dimensional 1 H NMR spectra were measured at 3, 25, and 50°C (Fig. 2). At 50°C, peaks with the chemical shift values of random coil peptides (39) are observed at 0.95 ppm (␦ and ␥ protons of Ile, Leu, and Val), 2.10 ppm (⑀ protons of Met), and 6.81 ppm (Tyr ring protons). Resolved peak doublets at 7.47 and 7.58 ppm are consistent with exposed C7 and C4 Trp protons, as one would find in an unfolded protein. A broad, very low shoulder between 8 and 9 ppm suggests that amide protons are unprotected and in rapid exchange with solvent protons and deuterons. These data indicate that rE cyto is unfolded at 50°C. Spectra recorded at 25 and 3°C are similar to that of the 50°C spectrum (Fig. 2). However, amide proton peaks appear at 25°C and become more prominent at 3°C, with a single small alkyl peak becoming resolved at 0.70 ppm. The appearance of amide proton peaks may result from secondary structure formation, or from a lower rate of solvent exchange resulting from the decreased temperature. Because the amide resonances lack the chemical shift dispersion that accompanies secondary structure formation, the latter explanation is more likely. Thus, rE cyto appears to be largely unstructured at 50, 25, and 3°C.

Recombinant E-Cadherin Cytoplasmic Domain and Endogenous E-Cadherin Have Similar ␤-Catenin Binding Properties-
The ␤-catenin binding properties of rE cyto were compared with those of E-cadherin isolated from eukaryotic cells. Full-length ␤-catenin, as well as fragments lacking the first ϳ90 amino acids (␤76; see "Experimental Procedures") or comprising the arm repeats (␤59) (12) were used in these experiments. Size exclusion chromatography (Table II) and native gel electrophoresis (data not shown) were used to separate mixtures comprising 2:1, 1.5:1, 1:1, 1:1.5, and 1:2 molar ratios of ␤76 and rE cyto . Similar experiments were carried out with rE cyto -␤catenin, rDE cyto -␤catenin, and rE cyto -␤59 complexes (data not shown). All experimental results were consistent with the 1:1 stoichiometry previously derived using endogenous proteins

TABLE II
␤-Catenin/E-cadherin complex stoichiometry as determined by size exclusion chromatography using purified recombinant proteins ␤76, a fragment of ␤-catenin extending from residue Arg 90 or Arg 95 to the C terminus, and recombinant E-cadherin cytoplasmic domain (rE cyto ) were mixed at various ratios, incubated for more than 1 h at 4°C, and injected onto an Amersham Pharmacia Biotech HR 10/30 Superdex 200 column. Peaks were integrated by photocopying the chart recorder trace, cutting out the peaks, and weighing them. The peak masses for one molar equivalent of ␤76 and E cyto were determined in independent runs (data not shown). a ␤76 could not be separated from ␤76-E cyto complex using this column; so the complex peak and ␤76 shoulder were cut out and weighed together. No free E cyto was observed, so the molar equivalents of ␤76 were calculated assuming the combined peak contained exactly 1 equivalent of ␤76-E cyto complex. (13,40). To compare directly the stability of recombinant and endogenous E-cadherin-␤-catenin complex, we examined the sensitivity of each complex to salt washes, a common method of testing protein-protein interactions in cell extracts. Endogenous E-cadherin-␤-catenin complex was immunoprecipitated from MDCK cell extracts with polyclonal anti-E-cadherin antibodies attached to Sepharose beads. Recombinant ␤-catenin-E cyto complexes were prepared in parallel and immunoprecipitated using the same buffer conditions. After washing the immunoprecipitates, the beads were incubated with buffers containing increasing concentrations of NaCl. The amount of complex resistant to dissociation was assayed by Western blots of the post-wash bead-associated proteins with cadherin and ␤-catenin-specific antibodies. The stability of the recombinant complex in NaCl was found to be similar to that observed with the endogenous proteins (Fig. 3). Although the two curves agree within experimental error, the recombinant complex was consistently slightly less stable than the endogenous complex. It is possible that the increased stability of the endogenous complex is the result of post-translational modifications, such as phosphorylation (16).

Electrostatic Complementarity Is Not a Dominant Factor in E-Cadherin-␤-Catenin Complex Stability-
The positively charged groove presented by ␤-catenin and the overall negative charge of the ␤-catenin-binding regions of E-cadherin, LEF-1, and APC suggests that electrostatic complementarity has a major role in the interaction of these proteins with ␤-catenin. As a simple test of this hypothesis, we examined the salt dependence of the interaction. As described above (Fig. 3), increasing NaCl concentrations reduce the stability of the E-cadherin-␤-catenin complex. However, roughly 50% of the cadherin-␤-catenin complex remains in NaCl concentrations as high as 2.5 M, suggesting that electrostatics make only a modest contribution to complex stability. To further gauge the contribution of electrostatics in complex formation, we tested whether the poly-L-glutamate polyanion P1818 (ϳ1000 Da; average length, 10 residues) could act as competitive inhibitor of rEcyto-␤-catenin complex formation. P1818 is a weak compet- itor of rE cyto for ␤-catenin binding when present at a 1000-fold molar excess and an effective competitor when present at a 9000-fold molar excess (Fig. 4). Increasing the ionic strength of the incubation mixture by adding NaCl significantly diminishes P1818 competitive inhibition (Fig. 4). The observed inhibition of cadherin binding by P1818 and its salt dependence are consistent with a nonspecific and largely electrostatic association of poly-glutamate with ␤-catenin. Collectively, these observations suggest that although electrostatic complementarity plays a role in cadherin-␤-catenin complex formation, it is not a dominant factor in the stability of the complex.
␤-Catenin Binding Protects E cyto from Proteolysis-The cadherin cytoplasmic domain is unstructured in isolation, but it could become structured or fold upon binding ␤-catenin. Compact globular domains typically exhibit some resistance to proteolytic degradation, so limited proteolysis is one means of ascertaining structural stability. The relatively nonspecific protease subtilisin and the acid-specific protease endoproteinase Glu-C (V8 protease) both readily digest rE cyto . However, in the presence of ␤-catenin, what appears to be full-length rE cyto remains at subtilisin (Fig. 5) and endoproteinase Glu-C (data not shown) concentrations that completely degrade rE cyto alone (Fig. 5, lanes 5 and 7 or lanes 12 and 14). Similar protection results were obtained with the DE cyto -␤-catenin complex and subtilisin (data not shown). ␤59, the armadillo repeat domain of ␤-catenin (12), is thought to contain the entire cadherinbinding site. This fragment also protects rE cyto from degradation but does so less effectively than full-length ␤-catenin (Fig.  5, lanes 7 and 9 or lanes 14 and 16). BSA-rE cyto mixtures were used to test whether the protection afforded by full-length ␤-catenin was simply due to the addition of an alternative substrate for subtilisin. The amount of BSA used in the digests was chosen to provide the same number of peptide bonds as full-length ␤-catenin. Adding BSA protects E cyto somewhat (Fig. 5, lanes 5, 10, and 11), but the protection is substantially weaker than that afforded by ␤-catenin and slightly weaker than that provided by ␤59 (Fig. 5, lanes 9 and 11 or lanes 16  and 18).
Instability Is Reflected in the Sequences of the Cadherin Cytoplasmic Tails-A protein that is normally unstructured in solution might be expected to have a higher than average number of charged/polar residues and a lower than average number of hydrophobic residues. Analysis of 18 type I and type II cadherins shows that they contain on average 41% more charged residues (Arg, Asp, Glu, and Lys) and 29% fewer aliphatic residues (Ile, Leu, Met, Phe, and Val) relative to the average amino acid composition of the proteins in the SWISS-PROT data base (Release 38). Given that the average calculated pI for the cytoplasmic tails is 4.4, it is not surprising that most of the increase in charge is from a higher than usual number of aspartic and glutamic acid residues. In most cases the compositional increase in Asp far outweighs that of Glu, as reflected in the "Asp index," which we define as the normalized change in total apartic acid content relative to total glutamic acid content (Fig. 6). It is interesting to speculate that this bias is related to the lack of tertiary structure seen in the domain; the side chain of Asp is less hydrophobic than that of Glu, making it less likely to form favorable packing interactions with other residues. The aliphatic index (AI), a measure of hydrophobicity and thermostability (41), was also calculated from these sequences. This is a compositional index based upon the sum of the mole percentages of Ala, Val, Ile, and Leu weighted by the relative side chain volumes. Proteins from thermophilic bacteria have been found to have a significantly higher AI value (mean of 92.6, S.D. of 10.6) than proteins from mesophilic organisms (mean of 78.8, S.D. of 14.5). Thus, a higher AI value is correlated with higher thermostability. The sequences from the cadherin cytoplasmic tails have AI values (mean of 64.4, S.D. of 4.2) significantly below that found for mesophilic proteins.
Cadherins appear to be targeted for degradation when not bound to ␤-catenin (14). Because the PEST sequence motif is correlated with rapid protein turnover in vivo (35), we used the PEST-FIND program to identify potential PEST sequences within the cadherin cytoplasmic domains. PEST sequences contain Pro, Glu or Asp, and Ser or Thr and are flanked by but do not contain basic residues (His, Arg, and Lys). PEST sequences with PEST-FIND scores greater than ϩ5 are considered the best candidates for being degradation signals, but many proteins containing sequences with lower PEST scores are known to be degraded (35). Of the 18 cadherin cytoplasmic domains analyzed, 15 have positive scoring PEST sequences that contain or overlap the serine-rich, minimal ␤-catenin-binding region in the C-terminal half of the domain; 12 of these sequences score greater than ϩ3, and 4 have values greater than ϩ5 (Fig.  6). This minimal ␤-catenin-binding region of cadherin also has two features frequently found in synthetic signals that target proteins to the ubiquitin-proteasome system in Saccharomyces cerevisiae: a high content of serine and threonine residues and a frequently recurring sequence motif, (bulky hydrophobic)-(S or T)-(S or T)-(bulky hydrophobic) (42). The sequence motif Leu-Ser-Ser-Leu is very highly conserved within the cadherin cytoplasmic tail and is located within the minimal ␤-cateninbinding region (Fig. 6). It should also be noted that five of the six type I (E-, N-, R, P-, and EP-cadherins) and two of the twelve type II (cadherin-8 and cadherin-20) cadherins had positive scoring PEST sequences in the membrane-proximal, Nterminal half of the cytoplasmic domain. All of these membrane-proximal type I cadherin PEST sequences scored greater than ϩ3.5. This region harbors the p120 ctn -binding site (25,26).
We calculated the instability index (II), a measure that is based upon a correlation between the stability of a protein in vivo and the frequency of certain dipeptides in its sequence (43), for each of the 18 sequences analyzed above. Index values greater than 40 are indicative of metabolic instability, which is defined as an in vivo half-life of less than 5 h. The average cadherin cytoplasmic tail II value was 56.6 ( Fig. 6), and only the cadherin-8 cytoplasmic tail was predicted to be stable. DISCUSSION The cytoplasmic tail is the most highly conserved domain among type I cadherins (34). With a length of ϳ150 residues, it is easily large enough to be an independently folded structural unit. We expressed the E-and DE-cadherin cytoplasmic domains in E. coli and purified them to homogeneity. Tryptophan fluorescence, circular dichroism, and one-dimensional proton NMR studies all lead to the same surprising conclusion: rE cyto and rDE cyto are unfolded in solution. The recombinant domains appear to have the same biochemical properties as those of endogenous cadherins. They form 1:1 stoichiometric complexes with ␤-catenin, and the stability of the rE cyto -␤-catenin complex as a function of salt concentration is comparable with that of endogenous E-cadherin-␤-catenin complex. In addition, rE cyto and rDE cyto can be concentrated to 40 and 20 mg/ml, respectively, 2 without the aggregation that would occur with a "misfolded" protein. Thus, it is unlikely that the recombinant proteins are simply misfolded in bacteria, and it is highly probable that endogenous cadherin cytoplasmic domains are also unfolded.
␤-Catenin appears to protect full-length rE cyto from degradation under conditions that completely degrade rE cyto alone.
The ␤-catenin-binding site of E-cadherin is thought to lie within the C-terminal 72 residues of the 150-amino acid cytoplasmic tail, with the minimal binding site encompassing just 30 residues (8). Given the relatively short binding sequence and the unfolded state of E cyto , proteolytic protection by sequestration of the entire domain seems unlikely. The simplest explanation is that E cyto adopts a defined conformation upon binding to ␤-catenin. We interpret the observed difference in protection provided by full-length ␤-catenin and ␤59 as evidence that regions of ␤-catenin outside the arm repeat domain interact with E-cadherin. A structuring of cadherin upon binding to ␤-catenin may facilitate interactions between cadherins and other junctional components such as p120 ctn .
The acidic nature of the ␤-catenin-binding regions of cadherins, APC, and LEF/Tcf transcription factors led us to propose that these otherwise unrelated proteins bind as extended polypeptides in the electrostatically positive groove present in the arm repeat region of ␤-catenin (12). Indirect support for the groove binding model comes from the observation that nuclear localization signal peptides bind in an extended conformation within the groove of the karyopherin ␣ arm repeat domain (44). FIG. 6. Sequence alignment and analysis of cadherin cytoplasmic domains. The cytoplasmic domains of six type I and twelve type II cadherins were analyzed with all except those marked as originating from X. laevis (Xl) or R. norvegicus (Rn) being of human origin. The alignment includes only the C-terminal PEST region of the cadherins, with the residue numbers for the mature cadherin-1 protein (CAD 1) shown above. PEST sequences are shaded, and a box has been placed around the highly conserved (Leu-Ser-Ser-Leu) motif. The PEST-FIND score (PEST) for the highlighted sequences, as well as the instability (II), aliphatic (AI), and aspartic acid (AAI) indices for the entire cytoplasmic domain are shown to the right of each sequence. The minimal ␤-catenin-binding region is indicated at the bottom. A cluster of phosphorylated serines identified in cadherin-1 is shown with asterisks.
Recent site-directed mutagenesis data have shown that APC and LEF-1 bind within the groove of ␤-catenin (30). As cadherins compete with APC and LEF-1 for binding to ␤-catenin (11,30), it is likely that cadherin also interacts with this region.
Although the groove appears to be the binding site for ␤-catenin ligands, our results indicate that electrostatic complementarity is not a major contributor to the stability of the cadherin-␤-catenin complex. This is consistent with experimental and theoretical studies showing that the energetically favorable interactions between complementary charges do not always compensate for the unfavorable desolvation of the participating charged groups (45,46). Although charged and polar interactions may not have a dominant role in complex stability, they are generally important for specificity, because the cost of desolvating polar groups upon burial in an interface must be offset by electrostatically complementary interactions. In addition, long range electrostatic interactions may be used to enhance rates of association, because they can provide attractive forces even during molecular rotation and realignment (47).
Electrostatic complementarity between ␤-catenin and E cyto may simply be a consequence of maintaining the cadherin cytoplasmic domain in an extended, unstructured state that can readily bind to ␤-catenin. When unfolded, a typical globular protein has poor solubility properties and is prone to aggregation. Thus, it is not surprising that the cadherin cytoplasmic domains have compositions skewed toward charged amino acids at the expense of aliphatic residues. Such a composition is likely to favor an extended conformation, because chargecharge repulsion will reduce the likelihood of self-association. ␤-Catenin may present an electrostatically positive surface to complement the acidic character required to maintain cadherin as an unstructured protein.
The lack of structure seen in the uncomplexed cadherin tail may be related to the turnover of cadherins. E-cadherin has a relatively short half-life (Ͻ5 h) in MDCK cells (48). ␤-Catenin associates with E-cadherin shortly after cadherin synthesis (13), and mutants deficient for ␤-catenin binding are retained within the endoplasmic reticulum and are rapidly degraded (14). We have shown here that the majority of cadherin cytoplasmic domains contain PEST sequence motifs. The PEST sequences may function as signals for the degradation of "catenin-free" or uncomplexed cadherins.
Some of the cadherin tail sequences have low PEST scores (Fig. 6), but other low scoring PEST sequences are known to be proteolytic degradation signals (35). Moreover, it has been shown that phosphorylation may activate a latent or low-scoring PEST sequence (35, 49 -53). A well studied example is the degradation of IB proteins, inhibitors of the transcription factor NF-B (54). The presence and phosphorylation of PEST sequences in IBs is required for a rapid and constitutive turnover of free IBs that may facilitate sustained NF-B activity (55)(56)(57)(58). Interestingly, a subclass of constitutively active Ras superfamily members binds to and possibly sequesters the PEST regions of free IB proteins, regulating their turnover (59). The E-cadherin PEST sequence is phosphorylated in a serine-rich region that is highly conserved among type I cadherins and is also present in type II cadherins (Fig. 6) (8), suggesting that phosphorylation can activate these degradation signals as well.
The PEST and ␤-catenin-binding regions of cadherins overlap extensively, and phosphorylation of the serines in this region has been shown to increase the affinity of E-cadherin for ␤-catenin (16). It is not known when or where in the cell cadherins are phosphorylated. Binding of cadherins to ␤-catenin may prevent phosphorylation by sequestering the PEST serines from kinases. Those cadherin molecules that have al-ready been phosphorylated will have increased affinity for ␤-catenin. Upon ␤-catenin binding, these activated PEST sequences would again be sequestered, this time from recognition by the degradation machinery. Either or both of these scenarios would lead to selective degradation of those cadherin molecules that fail to bind to ␤-catenin.
Given that the cytoplasmic domains of cadherins are unstructured in the absence of ␤-catenin, they are likely to be good substrates for kinases as well as the cellular protein degradation machinery. Functional adhesion requires cadherins to be linked to the cytoskeleton through ␤and ␣-catenins, so free cadherin molecules at the cell surface might act as competitive inhibitors of adhesion. Cytosolic levels of ␤-catenin are tightly controlled to prevent inappropriate activation of Wnt-responsive genes. Increasing ␤-catenin levels by induction of the Wnt pathway increases the formation of cadherin-catenin complex and cell-cell adhesion in some cell lines (60,61). Likewise, the increased level of total ␤-catenin following overexpression of cadherin in other cell lines likely reflects stabilization of ␤-catenin molecules that would otherwise be turned over by the Wnt pathway (25,(62)(63)(64). These observations indicate that cadherin and ␤-catenin turnover are coupled. Cadherins that are not associated with ␤-catenin may be targeted for degradation by sequences in their unstructured cytoplasmic domains, reducing the population of catenin-free cadherins at the cell surface.