Reconstitution of Two Recombinant LSm Protein Complexes Reveals Aspects of Their Architecture, Assembly, and Function*

Sm and Sm-like (LSm) proteins form complexes engaging in various RNA-processing events. Composition and architecture of the complexes determine their intracellular distribution, RNA targets, and function. We have reconstituted the human LSm1–7 and LSm2–8 complexes from their constituent components in vitro. Based on the assembly pathway of the canonical Sm core domain, we used heterodimeric and heterotrimeric sub-complexes to assemble LSm1–7 and LSm2–8. Isolated sub-complexes form ring-like higher order structures. LSm1–7 is assembled and stable in the absence of RNA. LSm1–7 forms ring-like structures very similar to LSm2–8 at the EM level. Our in vitro reconstitution results illustrate likely features of the LSm complex assembly pathway. We prove the complexes to be functional both in an RNA bandshift and an in vivo cellular transport assay.

The Sm and Sm-like (LSm) 1 proteins are a widespread protein family with members in all kingdoms of life. Phylogenetic distribution suggests Sm proteins were already present in the last universal common ancestor of all present-day life forms and that the family underwent an explosive diversification with the advent of eukaryotes (1). Archaebacteria harbor one or two SM/LSM genes each. Escherichia coli Hfq and its orthologues in other Gram-negative bacteria are so far the only known eubacterial LSm proteins (2,3). In contrast, eukaryotic genomes appear to contain 24 or more Sm/LSm genes (1,4). Thought to have originally arisen as chaperones mediating RNA-RNA interactions (5), Sm/LSm proteins have diversified through evolution and adopted new functionalities. LSm protein function in archaea is unknown. A structural and biochemical study on Archaeoglobus fulgidus LSm proteins showed that they bind to RNase P RNA in vivo and in vitro (6), a feature that has also been observed for several yeast LSm proteins (7). E. coli Hfq is a pleiotropic regulator of RNA metabolism (8).
Sm/LSm proteins are characterized by a bipartite sequence motif of about 80 amino acids long situated in most members at the N terminus. Recently, divergent family members with additional domains have been identified (1,4). The conserved motif translates into a fold common to all Sm/LSm proteins. This Sm fold mediates specific Sm-Sm interaction through a generic interface, which the various Sm protein family members use to build up homomeric (in prokaryotes) or heteromeric (in eukaryotes) ring-shaped complexes. These represent the functional form of all Sm/LSm proteins. The common fold, generic interface, and ring-like morphology of Sm/LSm complexes provide a rationale for the observed large variety of RNA targets bound, the diverse complex compositions and functions. LSm proteins appear as building blocks for complexes whose composition and architecture determines their intracellular distribution, interaction with RNA targets and non-Sm effector proteins, and function. The structural basis for the balance between interaction specificity and flexibility required for assembling different complexes with some subunits in common is unknown.
The canonical Sm core domain composed of the seven Sm proteins B, D 1 , D 2 , D 3 , E, F, and G was demonstrated to assemble in an ordered pathway onto a conserved, singlestranded stretch on their target RNAs, the Sm site of the spliceosomal snRNAs U1, U2, U4, and U5 (22,23) (Fig. 1). The pathway is marked by the RNA-free sub-complexes D 1 D 2 , D 3 B, and EFG (22). The sub-complexes may constitute stages of the assembly pathway. Sm-snRNA assembly occurs in the cytoplasm. After hypermethylation of the snRNA moiety, the pre-snRNPs are transported to the nucleus, where they mature to functional particles (9). In vivo, Sm core domain assembly is a highly regulated process involving Sm protein modification by a methylase (24) and numerous assembly factors like the survival of motor neurons protein. In vitro, the Sm core domain can be assembled from Sm protein sub-complexes by the addition of target snRNA in the absence of any auxiliary factors (23). 2 Spliceosomal U6 snRNA differs from the other snRNAs in many ways. It is thought to have an entirely nuclear life cycle (25,26), does not bear an Sm site, and does not bind the canonical Sm proteins. However, a complex built up of seven Sm-like (LSm) proteins 2-8 was shown to interact with the 3Ј end of U6 snRNA in the nucleus (27), stabilizing U6 snRNP and the U4/U6 snRNA interaction (Fig. 1).
The Sm core domain can only assemble onto its U snRNA target and is only stable in the presence of the RNA (23). In contrast, the native LSm2-8 complex has been shown to be stable in the absence of RNA (27). It is likely that the LSm2-8 complex is assembled in the cytoplasm, migrates as such to the nucleus, and there binds to U6 snRNA. The LSm2-8 assembly thus differs from the canonical core Sm domain pathway. In addition to LSm2-8, a cytoplasmic LSm1-7 complex exists that engages in mRNA decapping and degradation (10,11). The two complexes have LSm proteins 2 to 7 in common, differing only in the seventh subunit (LSm8 and LSm1, respectively, Fig. 1). The LSm1-7 assembly pathway is even less well characterized than the LSm2-8 pathway. LSm1-7 has been shown to accumulate in cytoplasmic foci together with other components of the mRNA decapping and degradation machinery (28,29). These foci are apparently active sites of mRNA turnover (30), but the available data do not indicate whether LSm1-7 assembles in these foci or elsewhere in the cytoplasm, nor whether it binds its mRNA targets as a preassembled complex. It is unknown whether LSm protein sub-complexes analogous to the Sm heterodimers and heterotrimers exist. Here we show that stable, soluble human LSm23, LSm48, and LSm567 sub-complexes corresponding to their paralogues SmD 1 D 2 , SmD 3 B, and SmEFG can be obtained by coexpression in E. coli. LSm1 and LSm4 are produced from monocistronic vectors. Isolated subcomplexes assemble into ring-like higher order structures, underscoring the preference of eukaryotic LSm proteins to associate with heterologous binding partners. The fact that both LSm1-7 and LSm2-8 complexes can be reconstituted from these components in the absence of RNA suggests that both species assemble in the cytoplasm and bind to their target RNAs as pre-assembled units. We show the recombinant LSm2-8 complex to be functional by in vitro bandshift with U6 snRNA and by an in vivo cell microinjection/intracellular transport assay.

MATERIALS AND METHODS
Cloning, Expression, and Purification of LSm1-8 Proteins and Subcomplexes-LSM2, -3, and -5-8 were subcloned from a human lymphoma U937 cDNA library (Stratagene) into a modified pUC19 vector. LSM1 and LSM4 were subcloned from expressed sequence tag clones IMAG p998P2110673Q2 and IMAG p958A041800Q2, respectively. Expressed sequence tag clones were obtained from the genetic resources center, Berlin (RZPD, available at www.rzpd.de). LSM2/3, LSM4/8, and LSM5/6/7 polycistronic T5 expression cassettes were constructed by successive compatible overhang cloning using engineered BamH1/BglII sites. The final cassettes were transferred to the pQE30 T5 expression vector (Qiagen, Basel). Expression constructs bear an MRGSH 6 tag at the N terminus of the first cistron, followed by a tobacco etch virus (TEV) cleavage site. LSM1 and LSM4 were subcloned into pQE30 as monocistrons. SG13009[pREP4] (for LSm1, LSm2/3, LSm4, and LSm5/ 6/7) or BLR[pREP4] (for LSm4/8) E. coli cells were transformed with plasmid DNA and plated out on selective media. LB starter cultures were grown at 30°C overnight, and 2-12 liters of LB media were inoculated the next day. Cultures were grown to an A 600 of 0.8 at 37°C and induced with 1 mM isopropyl 1-thio-␤-D-galactopyranoside. Induction temperature was between 25°C and 37°C. Cells were harvested after 4 -48 h of induction, depending on construct. Cell pellets were resuspended in lysis buffer (20 mM HEPES-Na, pH 7.50, 0.5-1.0 M NaCl, 10 mM imidazole-Cl, pH 7.50, 5 mM ␤-mercaptoethanol), sonicated, and treated with DNase I. Insoluble material was removed by ultracentrifugation, and supernatants were purified by immobilized metal ion affinity chromatography (IMAC) on nickel-charged Hi-Trap chelating Sepharose columns (Amersham Biosciences). LSm proteins and sub-complexes were eluted with imidazole step gradients (60, 250, and 500 mM). If insufficiently pure, samples were subsequently dialyzed into 100 mM NaCl buffer without imidazole and subjected to ion exchange chromatography (100 mM to 1 M NaCl). Samples were frozen in liquid nitrogen in ion exchange buffer. In some instances, the MRGSH 6 tags were cleaved off by TEV protease (1:100 ratio, overnight at room temperature), and the sub-complexes purified by IMAC. Cloning and expression of Sm protein sub-complexes D 1 D 2 and D 3 B has been described elsewhere (31). An SmEFG heterotrimer was produced from a pET15b vector (Novagen) and purified via consecutive IMAC and ion exchange chromatographies. 12% SDS-PAGE gels were run with equivalent amounts of total cell extracts at the time of induction (T 0 ) and time of harvest (T 4 -T 48 ) for both soluble material and pellet (insoluble material). After staining with Coomassie Brilliant Blue R250 (GERBU Biochemicals, Germany) and destaining, gels were scanned, and the bands corresponding to soluble and insoluble LSm proteins were integrated via densitometry using the program ImageJ 1.29x (Wayne Rasband, National Institutes of Health).
Reconstitution and Purification of LSm1-7 and LSm2-8 Complexes-Individual LSm protein or sub-complex preparations were incubated in 4 M urea, 1 M NaCl buffer for 2 h at 37°C and then mixed in equimolar amounts for the assembly of the desired heptamer. The mix was incubated again for 2-5 h, and the sample was dialyzed against buffer with progressively less salt (1 M and 0.5 M NaCl) overnight at 4°C. Reconstituted LSm1-7 and LSm2-8 were purified by consecutive gel filtration and anion exchange chromatographies.
Electron Microscopy and Image Processing-Samples were diluted at 10 -20 g/ml. Aliquots of 5 l were stained with 1% (w/v) uranyl acetate after sample adsorption onto glow-discharged 400-mesh carbon-coated grids. The micrographs were recorded at an accelerating voltage of 100 kV and a magnification of 50,000ϫ, using a Hitachi 7000 electron microscope. All micrographs were recorded on Kodak SO-163 film. Reference-free alignment was performed on manually selected particles from digitized electron micrographs using EMAN image processing FIG. 1. Schematic representation of the three best characterized Sm/LSm protein complexes. Sm core domain, part of the spliceosomal U1, U2, U4 snRNPs, LSm2-8 binding to the 3Ј end of U6 snRNA, and LSm1-7, binding to the 3Ј untranslated region (thin line) of mRNAs destined to be degraded in the cytoplasm. The latter complex has been shown to interact with other components of the mRNA decapping/degradation machinery, the decapping enzyme Dcp1, the exonuclease Xrn1, and the auxiliary factor Pat1.
package (32). After multivariate statistical analysis of a set of rotational and translational invariants previously generated, a reference-free kmeans classification was performed on the resulting footprint file. The resulting classified images were then aligned and classified iteratively. The class average with the best signal-to-noise ratio were selected and gathered in a gallery.
Gel Permeation Chromatography/Static Light Scattering Analysis-LSm sub-complexes were run on a Superdex 200 HR10/30 gel permeation column using an ⌬KTA Explorer FPLC (both Amersham Biosciences) coupled to a miniDAWN static light scattering analyzer and an Optilab DSP refractometer (both Wyatt Technology Corp., CA). Data were analyzed using Wyatt's ASTRA 4 software. Analytical gel filtrations were run on a Superdex 200 PC3.2/30 column using an Ettan HPLC (Amersham Biosciences).
Analytical Ultracentrifugation-LSm sub-complexes were subjected to an equilibrium sedimentation run in 20 mM HEPES-Na, pH 7.5, 200 mM NaCl, 5 mM ␤-mercaptoethanol buffer on an Optima XL-A analytical ultracentrifuge (Beckman Coulter) at 12,000 rpm. Data analysis was performed using the program DISCREEQ (50).
Electromobility Shift Assays-Xenopus tropicalis U6 snRNA or Xenopus laevis U1 snRNA (a kind gift from Iain Mattaj, EMBL Heidelberg) were in vitro transcribed and body-labeled with [ 32 P]UTP. 20,000 cpm purified U snRNA was incubated with 5 pmol of an equimolar mixture of the Sm D 1 D 2 , D 3 B, and EFG sub-complexes or LSm protein heptameric complexes in a buffer containing 20 mM HEPES-Na, pH 7.50, 300 mM NaCl, 5 mM MgCl 2 , 10% glycerol, 0.5 l of RNasin (Promega) and 0.5 mg/ml yeast tRNA in a 10-l assay at 30°C for 1 h, then at 37°C for 1 h. Samples were loaded on 6% native PAGE gels and run at 4°C for 2.5 h, 160 V. Gels were autoradiographed wet for 14 -16 h at Ϫ80°C on x-ray film.
Cell Microinjections-REF52 rat fibroblasts were grown to 60 -80% confluency on coverslips in Opti-Mem TM medium with glutamine (Invitrogen). LSm1-7 was labeled with Alexa555, LSm8 and LSm2-8 with Alexa488 succinimidyl ester fluorescent dyes (Molecular Probes) according to the manufacturer's protocol. Excess dye was removed by dialysis into microinjection buffer (20 mM HEPES-Na, pH 7.5, 150 mM NaCl, 5 mM ␤-mercaptoethanol). Aggregates were removed by centrifugation (15 min and 13,000 rpm), and the supernatant was injected into cells. After incubation for 30 -120 min, cells were fixed and visualized using an Olympus confocal fluorescence microscope. For analysis of LSm2-8 active transport, wheat germ agglutinin (Sigma) was coinjected at a concentration of 2.5 mg/ml.

RESULTS
Expression of canonical human Sm proteins in E. coli from single cistron vectors gives very low yields or insoluble protein.
In contrast, high yields of soluble Sm proteins are obtained by coexpressing the SmD 1 D 2 , SmD 3 B, and SmEFG sub-complexes from polycistronic expression vectors (31). 3 These correspond to the sub-complexes identified in HeLa cell nuclear extract (22).
Although some LSm proteins can be expressed more efficiently in a soluble form from monocistronic vectors than their canonical Sm protein paralogues, in general yield is very low and the obtained preparations tend to aggregate heavily. Based on our experiences with Sm protein coexpression, and to facilitate expression and purification of LSm proteins, we initially constructed polycistronic expression vectors encoding LSm2/3, LSm4/8, and LSm5/6/7 cDNAs. These heterodimers and heterotrimers correspond to the canonical SmD1D2, SmD3B, and SmEFG sub-complexes, respectively. LSm1 and LSm4 were constructed and expressed as monocistrons for the reconstitution of LSm1-7. Expression yield and solubility of single-cistron LSm constructs could be greatly enhanced by fusing to two N-terminal Z tags (Staphylococcus aureus protein A IgG-binding domain), followed by a His 6 tag and a TEV cleavage site. This phenomenon is exemplified by the ZZ-His 6 -TEV-LSm6 purification record (Fig. 2d). For crystallographic and other studies, we proceeded to express LSm5, LSm6, LSm8, and the complexes LSm5/6, LSm5/7, LSm6/7, and LSm5/3 (data not shown). In general, the solubility of a given LSm protein increased by up to 25-fold by coexpression, as measured by the supernatant:pellet ratio (Table I). Recombinant LSm protein sub-complexes were purified by Ni-IMAC followed by ion exchange chromatography where necessary. For each polycistron, only the first cDNA bears a His 6 tag; the other LSm proteins are isolated through sub-complex formation and co-purification. The complexes and single LSm proteins were purified to homogeneity, as shown by SDS-PAGE (Fig. 2, a-d). Sample integrity is further demonstrated by the successful crystallization of various LSm protein preparations. Weakly diffracting crystals could be obtained from LSm6 (Fig. 2e) and LSm5/6/7 (Fig. 2f).
The purified sub-complexes were characterized biophysically. Analytical ultracentrifugation (AUC) and static light scattering experiments combined with gel filtration chromatography yielded molecular weights that indicate formation of higher order structures (Fig. 3, a-c, and Table II): The LSm2/3 heterodimer has a nominal molecular mass of 25 kDa. In the analytical ultracentrifuge, the LSm2/3 oligomer distribution is bimodal at 10 M concentration, containing a hexamer (10%) and an octamer (87%). The molecular mass of LSm5/6/7 is 33 kDa. Analytical ultracentrifugation yields a mixture of individual subunits (26%), trimer (25%), hexamer (40%), and nonamer (8%) species at 16 M concentration. Analytical gel filtration combined with static light scattering measurements yields 85 kDa for LSm2/3 and 77 kDa for LSm5/6/7. These values reflect the heterogeneity in oligomer distribution found by AUC. LSm5/6/7 stays intact during gel filtration, and individual subunits are not observed. Upon incubation in up to 8 M urea, the highest elution volume of LSm5/6/7 species corresponds to the trimer. This stands in contrast to LSm2/3, which at urea concentrations of 4 M and higher, falls apart to some extent into its subunits (data not shown). LSm4/8 aggregates most strongly of the LSm sub-complexes and does not seem to form oligomeric higher order structures of defined stoichiometry (Table II).
Our concept on sub-complex higher order structure is confirmed by negative-stain electron microscopy. LSm2/3 (Fig. 4a, overview) shows up as ring-shaped structures with slightly smaller dimensions than the 8-nm outer diameter and 2 nm for the central hole that were measured for the native LSm2-8 complex from HeLa cell nuclear extract (27). LSm 5/6/7 shows up mainly as a ring-shaped structure as well but heterogeneities appear in the background of the electron micrographs (Fig. 4b, overview). LSm4/8 aggregated too strongly to yield homogeneous particles in electron micrographs (data not shown). After particles classification and subsequent class averaging, distinct ring particles can be observed in LSm2/3 galleries (Fig. 4a, bottom) having an outer diameter of ϳ7 nm and a cavity of Ͻ1.5 nm. Considering the mass of the LSm2/3 heterodimer (25 kDa), we suggest that the resulting class averages correspond to octameric LSm2/3 ([Lsm2/3] 4 ) in accordance with the AUC measurements. Although the LSm5/6/7 preparation did not show up as homogenous as LSm2/3, class averaging yielded ring-shaped particles having a size of ϳ7 nm and a cavity of 1.5 nm suggesting a hexameric arrangement ([LSm5/6/7] 2 , 2 ϫ 33 kDa) based on the AUC data. Nevertheless, from the electron microscopy analysis, a nonameric arrangement cannot be excluded. Smaller particles appear in the background (Fig. 4b, circles) and could represent LSm5/6/7 trimers. The small size of such particles was not suitable for classification.
The Sm core domain can be reconstituted in vitro from recombinant Sm sub-complexes and U snRNA with good efficiency under native buffer conditions (33). The LSm2-8 complex isolated from HeLa cell extract is, in contrast, stable and likely to assemble in the absence of RNA (27). For our LSm complex in vitro reconstitution protocol, we required the dis-ruption of the higher order structures formed by the sub-complexes. The in vitro reconstitution process should then be guided by relative thermodynamic stability. Reconstitution was carried out by mixing equimolar amounts of LSm2/3, LSm4/8, and LSm5/6/7 (for LSm2-8) or LSm4, LSm1, LSm2/3, and LSm5/6/7 (for LSm1-7) under semi-denaturing conditions (for details, see "Materials and Methods") followed by dialysis. The reconstituted complexes were then purified by ion exchange chromatography (Fig. 5, c and d) followed by gel filtration (Fig. 5, e and f). In both types of chromatography, they elute as single peaks, demonstrating sample homogeneity in charge and in size. The SDS-PAGE gels of the purified LSm1-7 and LSm2-8 clearly show the presence of all seven different subunits (Fig. 5, a and b, lanes 5 and 4, respectively). Molecular mass determination by gel filtration chromatography coupled to static light scattering yielded a figure of 92 kDa for LSm2-8 (Table II). LSm1-7 was analyzed by analytical ultracentrifugation. At 10 M concentration, LSm1-7 is a mixture between heptamer and 14-mer. At 20 and 50 M, the proportion of 14-mer grows (Table II). An alternative model with the major components of sub-complex preparations (LSm2/3 octamer, ϳ100 kDa, and LSm5/6/7 hexamer, ϳ66 kDa) does not satisfactorily fit the data. The comparatively large losses through aggregation in the LSm1-7 AUC run stem from using an only partially purified sample still containing oligomers of different composition that appear to be far more prone to aggregation than the heptamer. In conclusion, once purified, LSm1-7 does not fall apart into its sub-complexes and subsequently rearranges into alternate higher order structures in solution. Both heptameric complexes elute from gel filtration chromatography with elution volumes corresponding to molecular masses between 85 and 99 kDa (Fig. 5, e and f), in accordance with the calculated masses (86 and 93 kDa, respectively). It should be noted that the in vitro reconstitution of the canonical Sm core domain works with good yields in native buffer, whereas LSm1-7 and LSm2-8 do not form in vitro under these conditions. We proceeded to take electron micrographs of negatively stained LSm1-7 and LSm2-8 complexes (Fig. 4, c and d). Both species show up as ring-like shapes. In the case of LSm1-7, as for LSm567, smaller particles appear in the background (Fig.  4C, white circles) and could represent fragments of LSm1-7. The LSm2-8 preparation appears more homogeneous (Fig. 4d). The LSm1-7 outer dimension is ϳ7 nm. The central accumulation of stain measures Ͻ1.5 nm. For LSm2-8, the values are 8 and 3 nm, respectively. Because in contrast to the canonical core snRNP domain (23), these preparations do not contain RNA, we have to assume the central feature represents a cavity, or hole. The recombinant LSm2-8 architecture thus corresponds to its native counterpart at the ultrastructural level (27). Remarkably, the LSm1-7 complex is very similar to LSm2-8 at this resolution (Fig. 4, compare panels c and d). Our data provide the first experimental evidence that LSm1-7 as-sembles into a structure that is very similar to LSm2-8 and the canonical core domain. Whether this is also true at the atomic level has to await the solution of the crystal structure of the three complexes.
The native LSm2-8 complex was initially isolated from U4/U6 snRNP and shown to bind to U6 snRNA in vitro (27). To test the function of our recombinant LSm complexes in vitro and assess binding specificity, we performed electrophoretic mobility shift assays (bandshift) with U6 and U1 snRNA. Preassembled, purified LSm2-8 shifts U6 snRNA (Fig. 6a, lane 5), whereas individual sub-complexes LSm2/3, LSm4/8, or LSm5/ 6/7 do not (Fig. 6a, lanes 2-4). Neither does a reconstituted LSm particle in which either of the sub-complexes (Fig. 6b,  lanes 4 -6) or a single LSm protein (LSm6, Fig. 6b, lane 7) has been left out. Leaving out LSm2/3 leads to sample aggregation and a shift into the well (lane 4). LSm1-7 complex does not shift U6 snRNA under the same conditions (Fig. 6b, lane 3). Specificity of complex formation could further be demonstrated by adding an LSm2/3-specific antibody to the reaction mixture. This assay leads to a supershift (Fig. 6c, lane 3). In the same assay conditions, LSm2-8 does not bind strongly to U1 snRNA (Fig. 6d, lane 3), in contrast to a 1:1 mixture of the seven canonical Sm proteins (D 3 B ϩ D 1 D 2 ϩ EFG, lane 2). The combination Sm proteins added to U6 snRNA leads to aggregation and a shift into the well (lane 5). Increasing the stringency of the assay abolishes the slight background LSm2-8-U1 snRNA interaction (panel d, lane 2) as well as the aggregation in the Sm-U6 snRNA reaction (lane 5), but invariably the LSm2-8-U6 snRNA bandshift as well (data not shown). In summary, the recombinant LSm2-8 complex shows the same RNA binding characteristics as its native counterparts, and is functional in vitro.
Consistent with its functions in splicing and rRNA processing, the LSm2-8 complex has been found to localize in the nucleus (34), and in the nucleolus (35). LSm1-7 accumulates in particular cytoplasmic features called foci (28) or GW bodies (36,37). To test whether our recombinant LSm complexes show corresponding subcellular distributions, we injected fluorescently labeled LSm1-7 and LSm2-8 into rat fibroblasts. LSm1-7 distributes mainly in the cytoplasm, where it accumulates in distinct spots (Fig. 7a). In contrast, LSm2-8 concentrates in the cell nucleus (Fig. 7b). LSm2-8 nuclear migration is specific, because LSm8 on its own does not migrate to the nucleus, but leads to formation of pre-apoptotic granules instead (Fig. 7c). Nuclear transport of LSm2-8 is an active process, because it can be transiently blocked by coinjection of wheat germ agglutinin (Fig. 7, compare panels d with e (38)). Labeling of the LSm1-7 and LSm2-8 heptamers does not destabilize the complexes: On gel filtration, the labeled species elute at the same elution volume as the non-labeled heptamers ( Supplementary Fig. S1, a and b; the dye absorption at 495 nm coincides with the complex elution profile, and data not shown), and the peaks contain all seven resident proteins in stoichiometric amounts (Fig. S1, c and d). DISCUSSION As research in genomics and RNA processing progresses, ever more proteins containing the Sm/LSm motif are discovered, and new functionalities of LSm protein complexes are identified. Still, very little is known about LSm complex assembly pathways, nor how the architecture of the often very similar complexes determines their specific function. Eukaryotic Sm/ LSm proteins have a strong preference to form heterooligomers rather than homooligomers. Canonical Sm proteins form RNAfree heterodimers and heterotrimers that likely represent intermediates on the core snRNP domain assembly pathway. Specificity of LSm-LSm interaction impacts directly on the TABLE I LSm protein and sub-complex solubilities Solubilities are expressed as supernatant:pellet ratios (column: Ratio S:P), determined by band densitometry of SDS-PAGE gels as described under "Materials and Methods." Increase in solubility by heterologous coexpression is given for those LSm proteins that were also expressed as single cistrons, as judged by the change in the S:P ratio (column: Increase). Sub-complexes corresponding to assumed nearest neighbors in the LSm1-7 and LSm2-8 rings (see Fig. 1 assembly process, because lack of it must be overcome by the help of cellular assembly factors. Here we have presented results that show how LSm complex self-assembly can be successfully carried out in vitro in the absence of such assembly factors and results in a correct architecture and functional heptameric and LSm2-8 and, presumably, the LSm1-7 complex. LSm proteins tend to be more soluble than Sm proteins when produced singly. Nevertheless, providing another LSm protein as a heterologous binding partner in the same cell generally increases solubility by a factor of up to 25. In this way, we were able to produce soluble, stable LSm2/3, LSm4/8, and LSm5/6/7 sub-complexes, corresponding to SmD 1 D 2 , SmD 3 B, and SmEFG. However, the increase in solubility is independent of the combination and does not correlate with coexpression of assumed nearest neighbors in the LSm2-8 ring. We conclude there is a lower degree of LSm-LSm interaction specificity, as compared with the Sm-Sm interactions in the core snRNP domain. The results are in line with yeast two-hybrid data indicating a greater promiscuity for LSm than Sm proteins (39,40). The findings impact on the cellular LSm2-8 assembly pathway: lower intrinsic interaction specificity puts a higher demand on assembly factors guiding productive ring assembly. Indeed, LSm2-8 assembly in vivo could be promoted by snRNP assembly factors like survival of motor neuron, which has been demonstrated to interact with LSm4 in vitro (41,42). However, evidence that these interactions are also present in vivo is as yet lacking. 4 We could show that the LSm2/3 and LSm5/6/7 sub-complexes assemble into higher order ring-shaped heterooligomers by negative-stain electron microscopy. From the AUC results that indicated predominantly octamers (tetramers of dimers) for LSm2/3, we assume the LSm2/3 rings represent octamers. However, all Sm or LSm rings reported so far have either six or seven subunits, and it remains to be proven that the generic Sm-Sm interface defined by the D 3 B and D 1 D 2 heterodimers is capable of accommodating eight subunits in a ring. Alternatively, the LSm2/3 rings could be representing hexamers present in the LSm2/3 preparation at low concentration, in line with LSm5/6/7. Hexamer formation by an Sm sub-complex was previously demonstrated as a feature of the human EFG trimer (43). The physiological significance of this hexamer could not be demonstrated, and indeed the later establishment of the Sm core domain stoichiometry proved that the (EFG) 2 complex is not part of the final heptameric ring and most likely represents a storage form for the three proteins. Presence of the hexamer does not preclude heptamer formation in vitro: recombinant EFG preparations also show the hexamer, but can efficiently be reconstituted into a functional Sm core domain by the addition of SmD 1 D 2 , SmD 3 B, and U1 snRNA under native buffer conditions. 3 Electron micrographs of the LSm5/6/7 sub-complex rings indicate smaller dimensions (Ͻ1.5 nm and ϳ7 nm for the inner and outer diameter, respectively) than for LSm2-8 (see below). Because these comparatively smaller values are also found for the (heptameric) LSm1-7 and the (octameric or hexameric) LSm2/3 rings, one cannot conclude that size correlates either with the number of subunits in the ring or with RNAbinding characteristics. Native LSm sub-complexes probably do not bind RNA by themselves. This would fit with our results that in contrast to LSm2-8, none of the sub-complexes (or combinations thereof) binds U6 snRNA. Formation of higher order structure by the sub-complexes reflects the predilection of eukaryotic LSm proteins to form heteromeric, rather than homomeric complexes, as do their prokaryotic homologues. Our singly expressed LSm proteins generally form aggregates without defined stoichiometry (data not shown). The ring closure is likely due to the need to satisfy all available Sm-Sm interfaces by interaction with another LSm molecule. The Sm-Sm interface possesses a pronounced hydrophobic element (31), which, if unsatisfied by binding to a specific partner, leads to rapid aggregation and precipitation.
The sub-complexes can be assembled in vitro and in the absence of any RNA into LSm1-7 and LSm2-8 complexes. It was previously shown for native LSm2-8 to be stable in the absence of its target, U6 snRNA. For LSm1-7, similar information has not yet been available. Human LSm1-7 accumulates in cytoplasmic foci, which are assumed to represent sites of mRNA decapping/degradation, or storage forms of the involved enzymes (28 -30). However, it is not known whether LSm1-7 is pre-assembled elsewhere in the cytoplasm, arrives at these foci as an RNA-free complex, and binds to its target mRNAs on site. Our data suggest that LSm1-7 indeed binds to mRNA in a pre-assembled form.
The reconstituted complexes elute as single peaks from ion exchange chromatography, demonstrating sample homogeneity in charge and in size. Because of the great variation in pI within the LSm subunits (from 4.3 for LSm8 to 10.0 for LSm4), it is thus very unlikely that the LSm1-7 and LSm2-8 preparations consist of several sub-populations, each lacking one particular LSm subunit and containing two of another instead. The SDS-PAGE gels of the purified heptamers clearly show the presence of all seven different subunits. Both LSm1-7 and LSm2-8 preparations are homogeneous in size as well: they elute as single, Gaussian peaks from gel filtration with elution volumes corresponding to the expected molecular weights of the heptamers. The accuracy of molecular weight determination for LSm2-8 by static light scattering is Ͼ6%. Because the smallest subunit, LSm6, has a molecular mass of 9.1 kDa, representing about 10% of the complex's mass, the value of 92 kDa obtained for LSm2-8 (nominal molecular mass ϭ 86 kDa) is only compatible with a subunit number of seven. Similarly, the AUC analysis of LSm1-7 demonstrates the presence of a heptameric species in solution, which is at equilibrium with higher order oligomers, but not with smaller complexes like those observed in the sub-complex AUC runs. This result further illustrates the complex's stability and homogeneity. Taken together, sample homogeneity and composition together with the molecular weight determination results provide strong evidence for a "one of each subunit" stoichiometry of the recombinant LSm complexes, in line with the architecture of the canonical core Sm domain (44).
Negative stain electron micrographs show that recombinant LSm2-8 has a ring-like architecture with a diameter of ϳ8 nm. The shape and size are highly similar to the one previously observed for the native LSm2-8 complex isolated from HeLa cell nuclear extract (8 nm (27)) and core snRNP domain from the same source (45). The pore diameter we observed for the recombinant LSm2-8 complex is distinctly larger than in the native LSm2-8 complexes (3 versus 2 nm, respectively (27)). Because the recombinant complex shows the same RNA binding specificity, this difference must remain unexplained at the present time. The LSm1-7 rings appear to be slightly smaller,   5. Reconstitution of LSm1-7 (a, c, and e) and LSm2-8 (b, d, and f) heptamers. SDS-PAGE gels show input sub-complexes or subunits (a and b), homogeneity in charge is demonstrated by the single peak elution profile from anion exchange chromatography (c and d), and in size by the elution profile from gel filtration chromatography (e and f, see text).

E. coli
Hfq with an AU 5 G RNA oligonucleotide shows that pore size increases from 12 Å for the RNA-free hexamer to 15 Å for the RNA complex (47). In all LSm co-crystal structures solved with RNA oligonucleotides, the RNA molecules mainly wrap around the rim of the pore, although in one case, additional binding sites on the ring surface have been observed (48). This stands in contrast to the original concept that in the core snRNP domain, the Sm site target RNA threads through the pore of the heptamer. The concept was based on the electrostatics of the core domain model and the position of conserved residues assumed (and later shown) to bind RNA (31,49). Structural evidence to corroborate this idea has so far only been obtained at the ultrastructural level, by cryoelectron microscopy of the U1 snRNP (50). For LSm2-8-U6 snRNA interaction, the binding determinant has been shown to be the U 5 stretch at the 3Ј end of U6 snRNA (27). This target is freely accessible to a preassembled complex. Hence it is possible that the RNA threads through the LSm2-8 central cavity. However, the smaller pore diameter of the recombinant LSm1-7 complex could indicate differences to LSm2-8 in RNA binding. LSm1-7 binds to the 3Ј untranslated regions of deadenylated mRNAs.
Although the RNA binding determinants for the LSm1-7 complex have not been characterized in detail, LSm1-7 presumably does not bind to the extreme 3Ј end of its target mRNAs. At least in some cases, secondary structure elements found in many of its target 3Ј untranslated regions are likely to prevent the RNA threading through the LSm1-7 hole. The established biochemical features of LSm1-7 and LSm2-8-RNA interaction fit very well with the concept that both LSm1-7 and LSm2-8 assemble in the absence of RNA, are transported to their site of action, and bind to their targets on site, possibly using different binding modes. Elucidation of the exact mode of LSm1-7 and LSm2-8-RNA interaction will have to await solution of the respective crystal structures.
Recombinant LSm2-8 binds to U6 snRNA in vitro, whereas LSm1-7 does not. The RNA binding characteristics of the two native complexes are thus reflected by their engineered counterparts. However, we do not as yet possess a suitably short RNA target to demonstrate specific interaction with LSm1-7. Indeed the precise nature of the binding determinants on target mRNA for the LSm1-7 complex is currently not known. The validity of using U6 snRNA interaction as a measure for LSm2-8 function is underscored by the fact that only the integral LSm2-8 complex specifically binds to U6. Leaving out a single LSm protein or one of the sub-complexes from the reconstitution procedure produces complexes incapable of binding U6 snRNA. This observation holds despite the likelihood that all these mixtures will form ring-shaped higher order structures, just as the sub-complexes themselves. Ring-shaped multimers are ubiquitous in nucleic acid binding complexes and other cellular processes (51)(52)(53). The ring architecture is thought in general to generate new biophysical properties on the resident protein subunits, and often to convey new functions (54). The failure of the LSm sub-complexes to bind U6 snRNA shows that the ring architecture and the presence of LSm family members in the complex are not sufficient for specific interaction. This goes in line with the need for strong RNA target discrimination based on the presence or absence of a single specific subunit.
Our cell microinjections of fluorescently labeled LSm complexes or proteins show that the intracellular distribution of the recombinant heptamers reflects the migration behavior of their native counterparts, implying that the in vitro reconsti-  LSm2-8 (b, d, and e), or LSm8 alone (c) with Alexa488 fluorescent dye and injected into REF52 rat fibroblasts. Intracellular distribution was monitored 30 -40 min post injection by fluorescence microscopy (a-c). LSm8 injection led not to nuclear accumulation, but appearance of peri-nuclear granules possibly indicative of ensuing apoptosis (c). Coinjection of wheat germ agglutinin inhibited LSm2-8 nuclear transport up to ϳ1h (d). Inhibition was reversed upon longer incubation (2 h, panel e).
tuted complexes are functional in vivo. LSm2-8 nuclear transport is active and not diffusive. Fluorescent labeling of the heptamers does not disrupt them. These observations provide some evidence that the transported species is the intact heptamer. In a transfection assay, LSm8 is found to accumulate in the nucleus (28). In contrast, in our cell microinjection assay, LSm8 (the subunit likely to bear the nuclear transport determinant of LSm2-8) fails to accumulate in the nucleus. The difference to our result could be due to the production mode of the protein and time course of the experiment: singly expressed, our recombinant LSm8 forms aggregates likely to mask a resident nuclear localization signal. The aggregates are probably also toxic to the cells, explaining the occurrence of pre-apoptotic granules. Conversely, within the ϳ36 h of the transfection experiment, it is conceivable that the YFP-labeled LSm8 (produced at levels only slightly higher than the endogenous protein) assembles into functional LSm2-8, which is then the transport substrate (28). On the basis of these experiments, nuclear migration of isolated LSm8 cannot be ruled out, however.
Our findings imply that we have to view the specific interactions and functions of individual LSm subunits in the context of the ring architecture: Exposure and probably juxtaposition of particular sequence elements in the subunits are likely to be instrumental in defining the interaction of the complex with its target RNA, with assembly factors (e.g. a presumptive nuclear import receptor for LSm2-8) and effector proteins (like the exonuclease Xrn1 and the decapping factor Dcp1/2 in the case of LSm1-7). Our recombinant LSm protein complexes represent an ideal test system to study these interactions in molecular detail. Our results should contribute to the understanding of the pathway of LSm complex assembly and its regulation of LSm-RNA and LSm-protein interaction and function.