Structural organization of transcription termination factor Rho.

Escherichia coli has two known modes for termination of RNA transcription (1–4). One is intrinsic to the function of RNA polymerase, which can spontaneously terminate transcription in response to certain, limited sequences. The other mode is dependent upon the action of an essential protein factor called Rho and occurs at sequences that are specific for its function but that are less constrained than the sequences for intrinsic termination. Rho protein functions as a hexamer of a single polypeptide chain with 419 residues, which is the product of the rho gene (5). It is an RNA-binding protein with the capacity to hydrolyze ATP and other nucleoside triphosphates. Rho acts to cause termination by first binding to a site on the nascent transcript and by subsequently using its ATP hydrolysis activity as a source of energy to mediate dissociation of the transcript from RNA polymerase and the DNA template (6). In the cell, the ability of Rho to act at several terminators is dependent upon the presence of an essential 21-kDa protein called NusG (7) that binds both RNA polymerase and Rho itself (8). In vitro the dependence on NusG became apparent only at proximal terminators (at sites ,300 base pairs from the promoter) and under conditions when the RNA molecules are being elongated at the in vivo rate of ;40–50 nucleotides/s (9). The requirement for NusG when RNA chain growth is fast suggests that the NusG is acting to overcome a kinetic limitation of Rho to act alone, perhaps through mediating earlier access to the nascent RNA by the formation of a complex of Rho with RNA polymerase (10). The mechanism of how Rho acts to dissociate the transcription complex is unknown. One important approach to elucidating how interactions with RNA mediate termination of transcription is to determine the structure of the protein. Until a good crystal structure becomes available the properties of its structure will have to be inferred from other, less direct methods, such as biochemical characterization of the protein, phylogenetic comparative analyses, and the functional properties of mutants with known amino acid changes. This review summarizes our current understanding of the structure and function of transcription termination factor Rho based on these indirect approaches.


The Rho Polypeptide Has Two, Distinct
Functional Domains The E. coli rho gene was isolated and sequenced in 1983 by Pinkham and Platt (11). Subsequent comparison of its predicted amino acid sequence with those of other proteins indicated that it has several sequence motifs that are characteristic of ATPases (12) plus a sequence motif that is similar to the RNP1 1 sequence of eukaryotic RNA-binding proteins (13). The relative positions of these ATPase and RNA-binding sequence motifs are indicated on Fig. 1.
The location of possible loops in Rho was probed by determining where it is most readily cleaved by trypsin. The first cleavages of Rho factor by action of trypsin occur with about equal probability at two different sites (14), one at Lys 283 (15) and the other at either Arg 128 or Lys 130 (14) (the exact position in this region has not been determined). By using a UV cross-linking assay it was found that oligo(C) binds to the N-terminal fragment produced by the cleavage near residue 130 (14) and that trptЈ RNA reacts with a site that is between Lys 44 and Lys 100 (16). A polypeptide consisting of just the first 130 residues of Rho is also able to bind oligo(C) well, 2 as could a fragment ending at residue 116 whereas a fragment ending at residue 114 could not (17). Thus, the first 116 residues define the minimum RNA-binding domain. This segment also includes the sequence motif that resembles the RNP1 sequence of eukaryotic RNA-binding proteins (Fig. 1). Treatment of ATP-Rho mixtures with ultraviolet light covalently linked ATP to a site in the middle third of the polypeptide (14), within a region containing sequences that are very similar to two highly conserved sequence motifs in many ATPases, ATPase A (also known as the P loop sequence) and ATPase B (Fig. 1). Lys 181 , a residue within the ATPase A motif of Rho, is the site for reaction with the ATP analog pyridoxal 5Ј-diphospho-5Ј-adenosine (18) and of photoreaction with 8-azido-ATP (19), thus implicating that residue with close contacts with the ␥-phosphate and part of the adenosine ring. Altogether, sequences of Rho that are similar to those of other ATPases start at Arg 160 and extend as far as Glu 392 (20). Thus, of the two sites with about equal access to trypsin, the one at Lys 283 is within sequences that are part of an extended ATPase domain whereas the one near residue 130 is clearly between two functional domains.
These results indicate that Rho polypeptide consists of an Nterminal RNA-binding domain that is a distinct structural and functional entity. It is connected by a linker to an ATP-binding domain that comprises most of the remainder of the polypeptide.

Phylogenetic Comparison of Bacterial rho
Gene Homologs Genes with sequences related to E. coli rho have been isolated from a number of different organisms including several from deeply diverged lineages of the bacteria (20). The widespread occurrence of a rho-like gene suggests that Rho is ubiquitous in the bacteria, while its presence in the most deeply diverged organisms suggests that a rho-like gene was present in the ancestor that was common to the three domains of life, the bacteria, the archaea, and the eukarya. It is not yet known whether the archaea or the eukarya have rho-like genes.
The comparative analysis revealed that the Rho homologs are highly conserved, exhibiting a minimum identity of 50% of their amino acid residues in pairwise comparisons (20). Fig. 1 shows the distribution of the most highly conserved segments throughout the Rho polypeptides. The ATP-binding domain had a particularly high degree of conservation, consisting of some blocks of residues with sequences that are very similar to segments of the ␣ and ␤ subunits of F 1 -ATPase and of other blocks with sequences that are unique to Rho. The proteins in the data base with the closest similarities to the Rho homologs were almost invariably ␤ subunits of the F 1 -ATPase (AtpB) or subunits of vacuolar, proton-translocating ATPases. Even though Rho is functionally an RNA-DNA helicase (21), it does not show significant similarity to the RNA helicases that are members of a large class of proteins that have the DEAD or DEAH sequence motifs. Thus, as a helicase it appears to be unique. Within the ATP-binding domain of Rhos, the minimum identity with E. coli AtpB is 21%, and the minimum similarity is 43% in pairwise comparisons (20). The high degree of similarity among these proteins suggests that they may be derived from a common ancestor.
Because of this close phylogenetic relationship, the structure of the F 1 -ATPase that was deduced from x-ray crystallography (22) provides an excellent prototype for the ATP-binding domain of Rho. Based on the alignments of the similar and identical sequences, Miwa et al. (23) have modeled the tertiary structure of the ATPbinding domain of Rho. A diagram of that model is shown in Fig. 2 with some landmarks indicated. This model includes some secondary structural elements that extend into the RNA-binding domain because these also show some limited sequence similarity with the F 1 -ATPase subunits (23).
The blocks of sequence in E. coli Rho that are most similar to the other ATPases are those that extend from residues 167 to 192 and from residues 251 to 275 (20). The former includes the ATPase A sequence motif, APPKAGKT in E. coli Rho, and is closely similar to the sequences that form the P loop in phosphohydrolases of known structure. The second block includes the ATPase B sequence motif, VIILLD in E. coli Rho, of which the final Asp plays an important role in catalyzing ATP hydrolysis (24).
Two blocks of amino acid residues that are highly conserved within Rho homologs do not exhibit strong similarity to AtpB. Hence, these residues may be important for structural and/or functional properties that are unique to Rho and have been maintained by strong selective pressures during evolution of the bacteria. The first of these conserved sequence blocks extends from residues 297 to 310 in E. coli Rho, and the second block extends from residues 324 to 342 and includes loop R, helix G, and ␤-strand 8 (Fig. 2).
The RNA-binding domain is more diverged than the ATP-binding domain (Fig. 1). However, one of its most highly conserved segments includes an RNP1-like sequence, DGFGFLR in E. coli Rho. It is known from mutational studies described below to be involved in RNA binding. The part of the RNP1-like sequence in E. coli Rho that is most like that of the consensus sequence of eukaryotic RNA-binding proteins is the core sequence of GFGF ( Fig. 1). In Rho the first of these Gly residues is preceded by a highly conserved Asp residue. In contrast, the RNP1 consensus has a conserved basic residue at that position (25). The Rhos also have an RNP2 motif IYV in E. coli Rho that is another characteristic of the eukaryotic RNP domain proteins. However, in Rho the RNP2 motif is about 12 residues downstream of RNP1, 3 at residues 79 -81, whereas in the eukaryotic proteins it is usually at least 30 residues upstream of RNP1 (25). In this respect Rho is similar to a class of singlestranded DNA-binding proteins in bacteria called cold shock proteins, including CspA and CspB (26). Thus, in spite of the partial similarity, the RNA-binding domain of Rho does not seem to be evolutionarily related to the eukaryotic RNA-binding proteins. The fact that the Rhos and the eukaryotic RNA-binding domain proteins have the same GFGF core sequence is likely a consequence of convergent evolution.
An interesting variant of the structure of the RNA-binding domain has been found with Rho proteins from organisms in the high G ϩ C Gram-positive phylogenetic branch. This branch includes the Streptomycetes, Mycobacteria, and Micrococcus luteus. The rho homolog genes from three organisms in this group have been isolated and sequenced (27). They all encode larger polypeptides than do the rho genes isolated from other organisms (ϳ690 versus 420 residues) with most of the differences resulting from an insert of about 260 residues in the RNA-binding domain. These inserts are in the phylogenetically divergent region between the first and second conserved segments (residues 41-52 of E. coli Rho (Fig. 1)). Although all three organisms have an insert of very unusual amino acid composition, the sequences of these inserts are not conserved.
The M. luteus insert starts with an Ala-rich segment followed by a segment that has a large proportion of Arg, Asp, Gly, and Asn residues (27). It also has a very unusual 238-residue segment that lacks any hydrophobic residues and is thus unlikely to have ordered secondary or tertiary structures. Although the role of these inserts is not known, the evidence that a similar insert is present in other organisms from the same phylogenetic group suggests that the inserts arose as an evolutionary adaptation. These organisms have an unusually high G ϩ C content in their DNA, and since G residues have a strong propensity to pair with other residues, their mRNA molecules are likely to be more highly structured than RNA from organisms with fewer G residues. E. coli Rho is known to initiate its action at sites on the nascent transcript that have very little secondary structure (5). One possible adaptation that has been made in these organisms is to have a Rho factor that can initiate its action on a more highly structured RNA. Indeed, M. luteus Rho was found to terminate transcription of cro DNA with E. coli RNA polymerase at sites that are not accessible to E. coli Rho due to the structure of the transcript (27).

Residues That Affect Primary Function within
the Domains The involvement of residues in the RNP1-like sequence for RNA binding was established from studies of the functional defects of specific mutant Rho factors. Mutants in which both the Phe residues in the RNP1-like sequence (Phe 62 , Phe 64 ) were changed to either leucines or alanines had lower affinities for RNA, with the double Ala mutant being considerably more defective than the double Leu mutant (16). Of these two, Phe 62 appears to be more sensitive to change than Phe 64 . This became evident from the results of a study designed to determine which, if any, of the 19  (17). The two primary trypsin cleavage sites are shown. P loop identifies the loop with the ATPase A sequence. Loops Q and R are the next two loops that face the same side of the tertiary structure, and their residue numbers are indicated. In the F 1 -ATPase subunits the ␤-strands identified on the model as 1 and 2 form part of a extended sheet with strands 5, 4, 6, 7, 3, 8, and 9. The predictions of the three C-terminal ␣ helices labeled 1c, 2c, and 3c are tentative because they are based on very limited sequence similarity (23). The N-terminal segment, which shows no sequence similarity with the F 1 -ATPase subunits, is depicted as an oval at the top of the picture. residues in and preceding the RNP1-like sequence were critical for Rho function. 3 A large number of randomly produced mutants with single residue changes were screened for defects in termination function. This approach revealed three residues in the RNP1-like sequence that were particularly sensitive to change: Asp 60 , Phe 62 , and Arg 66 . 2 When Phe 62 was changed to a Ser, the resulting protein had a 100-fold lower affinity for a cro RNA (a transcript terminated by Rho action) and was very defective for transcription in vitro, thus implicating this residue in a critical, non-ionic interaction with RNA. 4 Mutations of the conserved Arg 66 residue to a number of other residues had strong effects on Rho function in vivo, 3 and the change of Asp 60 to a Gly residue caused Rho to bind more tightly to RNA, particularly to RNA molecules lacking a rut sequence needed to activate termination. 4 Thus, three residues in the phylogenetically conserved RNP1-like sequence are particularly important for Rho function.
Within the ATP-binding domain, mutants in which either of the two phylogenetically conserved lysines at residues 181 and 184 in the ATPase A sequence were changed to Gln residues had decreased termination efficiency, and the change of Asp 265 , a residue in the ATPase B motif, to Asn yielded a mutant Rho with very low ATPase and undetectable termination (28). This result was consistent with the evidence that the corresponding Asp residue in other ATPases has an essential role in catalyzing ATP hydrolysis (24), confirming the role of this sequence motif in ATPase function.
From a collection of a number of defective mutants with changes in residues in the C-terminal 100 amino acids, Miwa et al. (23) identified several that affect the binding of ATP and have modest to severe effects on ATP hydrolysis. Most of the changes were between residue 326 and 366 and include regions with homology to the F 1 -ATPase subunits. In the model for the ATP-binding domain of Rho (Fig. 2) the changed residues that affected ATP hydrolysis were all in contact with or near various functional groups on the ATP, thus supporting the use of the F 1 -ATPase structure as a model for the ATP-binding domain of Rho.

Evidence for Cooperative Interactions between the Two Domains Several mutants of E. coli Rho have been isolated based on defects in termination function in vivo.
Recently the residues changed by some of these mutations have been determined, including rho1 and rho115 (29,30). Besides being defective in transcription termination in vivo these two mutants have similar functional defects. Both bind RNA with about the same affinity as does wildtype Rho but are defective in activation of RNA-dependent ATP hydrolysis. They also have greatly increased K m for oligo(rC) in assays in which the hydrolysis of ATP depends on the presence of both poly(dC) and oligo(rC) (30,31). Since poly(dC) can bind to the primary site for polynucleotides in Rho, the change in K m is interpreted as affecting the putative secondary site (32,34). In spite of their similarity in function, the mutational changes in these two mutants are in very different locations; rho1 has a change of Lys 352 to Glu (29) while rho115 has two changes, G99V and P235H, of which the G99V change is responsible for the difference in K m values for oligo(rC) (30). The Lys 352 change is of a conserved residue deep in the ATP-binding domain while Gly 99 is a conserved residue of the RNA-binding domain downstream from the RNP1 and RNP2 motifs. Miwa et al. (23) also isolated some mutants of this type that had increased K m for oligo(C) but were little changed in primary site RNA binding. One change is of an unconserved residue at the C-terminal end (M416K) while another is of a conserved residue in the ATP-binding domain (M327T). Thus, mutations that yield this phenotype are not clustered in one or two specific regions. Instead changes in a number of different regions can affect the function of this proposed secondary RNA-binding site. Although some of these changes might be affecting directly the interaction of Rho with RNA at the putative secondary site, all or most of these mutants could be merely affecting steps in the coupling of RNA binding with ATP hydrolysis. These steps are likely to involve multiple interand intrasubunit rearrangements and thus be sensitive to changes at many different locations.
Another mutant that was isolated because it is defective for Rho-dependent termination in vivo is rho201. The Rho factor isolated from that mutant is also defective in its function in vitro. The protein has a single residue change, a Phe to Cys at residue 232 (35). The biochemical characterization of the defect of this Rho factor shows that it has a 100-fold lower affinity for mRNA and a greater rate of ATP hydrolysis with nonspecific RNA than does wild-type Rho. Thus, even though the mutational change is of a highly conserved Phe residue in the region of the ATP-binding domain between the ATPase A and ATPase B sequence, its primary defect is in RNA binding. This result suggests that mutations that are in the ATP-binding domain can affect the function of the RNAbinding domain.

Functional Changes Involving Non-conserved Residues
An example of a functional change caused by an alteration of a residue in a phylogenetically unconserved region is provided by the mutation Met 416 to Lys. Another such mutation is Phe 3 to Leu (36), which changes the specificity of Rho action by making it more active (37). In addition, two mutations with functional defects have been isolated in the phylogenetically divergent connector region. One is an Asp to Asn change at residue 156 (36); the other is the change of Glu to Asp at residue 134 (29). The E134D mutant behaves like a classical polarity suppressor mutation (like rho1), although it was isolated as a suppressor of a defect in NusA (29).
Some mutations in Rho factor create defects in the cell that prevent growth of certain bacteriophages, including and T4. Two mutants in this class (called rho (nusD)) are rho026 and rho4008. The first has two changes, Pro 103 to Leu and Ser 153 to Tyr, while rho4008 has only Pro 103 to Leu. 5 Although the mechanism of the bacteriophage growth exclusion has not yet been elucidated, these Rho factors cause transcription of the cro gene DNA template to be terminated at sites that precede those used with wild-type Rho alone but identical to those used when NusG is also present. 4 Thus, this mutation has made the Rho factor less dependent on NusG, which is consistent with its partial termination activity after NusG depletion in vivo (38).

Is There a Site for RNA in the ATP-binding Domain?
Several seemingly unrelated mutations that affect interactions with RNA (both primary and secondary) are actually clustered in the part of the tertiary structure of the ATP-binding domain that includes loop R, helix G, and ␤-strand 8 (23) (Fig. 2). To understand how the side chains in this region might be involved in interactions with RNA, we have to consider how the subunits might be arranged in Rho factor. Electron micrographs of negatively stained images (33,39) and cryoelectron microscopic studies (40) revealed a distinct ring shape structure in which six globular subunits were arranged around a hollow core. This organization closely resembles that of the ␣ and ␤ subunits of F 1 -ATPase. In the F 1 -ATPase the six isologous subunits have a pseudo-C 6 symmetry (22) and are arranged with all the N-terminal domains on one side and the Cterminal domains on the other. This organization differs from the symmetry that has been proposed for Rho; based on an interpretation of the intermediates formed during partial protein-protein cross-linking studies, Geiselmann et al. (41) proposed that Rho has D 3 symmetry. However, more recent studies of cross-linking intermediates are more readily interpreted in terms of a C 6 symmetry, 6 like the F 1 -ATPase structure. Geiselmann et al. (41) also found that fluorescent groups attracted to Cys 202 in Rho were separated beyond the critical Förster distance of ϳ45 Å in the Rho hexamer. They argued that this result was most consistent with a D 3 structure. However, in the model of Miwa et al. (23) Cys 202 is near the N-terminal end of ␤-strand 4 or close to the surface periphery of the Rho hexamer (Fig. 2). With a ϳ60-Å radius for the outer boundary of the hexamer (40), the individual Cys 202 residue would be separated by ϳ60 Å in the model with C 6 symmetry and thus be consistent with the fluorescence studies. Assuming that the quaternary structure of Rho is similar to that of the F 1 -ATPase, the six subunits in the hexamer would be oriented with all of the RNA-binding domains on one face of the ring and with the parts of the subunits containing loop R, helix G, and ␤-strand 8 exposed to the solvent in the inner hole of the ring. Thus, the fact that many mutations that affect interactions with RNA are clustered in the segments that extend into the hole suggests that the hole is a site of interaction with RNA. This arrangement raises the possibility that Rho could use the six RNA-binding domains to form the extended primary binding site with the secondary RNA-binding site located in the hole of the ring. Fig. 3A shows a diagram of the six subunits of Rho arranged with the six RNA-binding domains, colored red, all on one face of the ring structure, and Fig. 3B represents a complex of Rho with an RNA molecule indicated by a blue line. It is shown making extensive contact along the surface comprised of the RNA-binding domains of the six subunits (the primary site) and with its 3Ј-end inserted through the hole in the center of the ring (the secondary site). Distinct conformational shifts in the structure of a subunit could occur with each of the three following steps: binding of ATP; conversion of ATP to ADP ϩ P i ; and release of ADP ϩ P i . This directed set of conformational changes could be tightly coupled to contacts of the protein with the RNA in the hole and could act to pull the 3Ј-end through the hole. Since the 5Ј-portion of the RNA would be held by bonds along the surface of the RNA-binding domain, the interactions set forth in this model would be a type of tethered tracking, a mode of translocation that was suggested from studies of Faus and Richardson (42) and of Steinmetz and Platt (43). After the 5Ј-portion of the rut site in the nascent transcript forms stable contacts with the extended RNA-binding domain surface of Rho, a dissociation and reassociation of one or more subunits, which is known to occur readily with free Rho (44 -46) and with Rho bound to RNA, 7 would allow the 3Ј-tail of the rut region of the transcript to be captured in the hole. The electron micrographs of negatively stained, unbound Rho often reveal a distinct break in the ring (33,40), which might also be an open site for allowing RNA to enter the hole. Once captured, the conformational changes associated with concerted rounds of ATP hydrolysis would propel Rho in the 3Ј direction along the RNA and provide a force that is sufficient to dissociate the transcript from RNA polymerase. This model differs from two others that have been proposed recently (47,48) in ascribing a major role to a part of the protein that has not previously been implicated. Although the parts of Rho that comprise loop R, helix G, and ␤-strand 8 have not been shown yet to interact directly with RNA, the corresponding segments of RecA, another protein with a core ATP-binding domain that is topologically identical to that of F 1 -ATPase (23,49), make direct contact with DNA. Thus, there is a precedent for implicating that region of Rho. Also recent work on the structure and mechanism of the RuvB component of a multisubunit complex that promotes DNA branch migration indicated that it may act by pulling DNA molecules through the hole of a hexameric ring (50). However, this model for Rho is very speculative, and evidence that the RNA is inserted through the hole, let alone is translocated through the hole, is lacking.
The results obtained from the phylogenetic analysis and studies with mutant Rho factors have provided new insights on its structure and function, and these have been used to formulate a speculative model for its mechanism of action. One purpose for proposing this model is to stimulate the design of experiments that will elucidate the true mechanism. Until the structure of Rho has been determined by x-ray crystallization analysis, we will have to continue to rely on these indirect approaches. Since one of the key differences in the model is the symmetry, further cross-linking studies and possibly immunoelectron micrographs might resolve this issue. Eventually, this work will lead to an understanding of how this protein can couple the chemical energy derived from ATP hydrolysis into the mechanical actions that dissociate a transcript from its complex with RNA polymerase and DNA.