Yeast Nab3 Protein Contains a Self-assembly Domain Found in Human Heterogeneous Nuclear Ribonucleoprotein-C (hnRNP-C) That Is Necessary for Transcription Termination*

Background: Termination factor Nab3 binds RNA, but its function and structure are poorly understood. Results: The carboxyl terminus of Nab3 functions as a self-assembly region important for termination. Conclusion: Nab3 resembles hnRNP-C in this new function, indicating that it is related to mammalian RNA-packaging proteins. Significance: Nab3 coats RNA to form a higher order assembly due to interactions among RNA, RNA polymerase, and hnRNPs. Nab3 is an RNA-binding protein whose function is important for terminating transcription by RNA polymerase II. It co-assembles with Nrd1, and the resulting heterodimer of these heterogeneous nuclear ribonucleoprotein-C (hnRNP)-like proteins interacts with the nascent transcript and RNA polymerase II. Previous genetic analysis showed that a short carboxyl-terminal region of Nab3 is functionally important for termination and is located far from the Nab3 RNA recognition domain in the primary sequence. The domain is structurally homologous to hnRNP-C from higher organisms. Here we provide biochemical evidence that this short region is sufficient to enable self-assembly of Nab3 into a tetrameric form in a manner similar to the cognate region of human hnRNP-C. Within this region, there is a stretch of low complexity protein sequence (16 glutamines) adjacent to a putative α-helix that potentiates the ability of the conserved region to self-assemble. The glutamine stretch and the final 18 amino acids of Nab3 are both important for termination in living yeast cells. The findings herein describe an additional avenue by which these hnRNP-like proteins can polymerize on target transcripts. This process is independent of, but acts in concert with, the interactions of the proteins with RNA and RNA polymerase and extends the relationship of Nab3 as a functional orthologue of a higher eukaryotic hnRNP.

Saccharomyces cerevisiae Nab3 is an hnRNP 2 -like, RNAbinding protein that is involved in the termination of transcription of small, noncoding RNAs such as snRNAs and cryptic unstable (CUT) RNAs such as that involved in regulation of the IMD2 gene (1,2). IMD2 encodes IMP dehydrogenase, an enzyme important for GTP synthesis. Cellular GTP levels control its expression. When GTP is abundant, RNA polymerase II initiates transcription at an upstream location where the incor-poration of tandem guanines is called for by the template. Just downstream, a terminator is encountered that results in a short CUT RNA. When GTP concentrations are too low to support this start site choice, RNA polymerase II initiates with an adenylate at ϩ1 that is downstream of the terminator. This terminator bypass results in synthesis of IMD2 mRNA in GTPdepleted states.
During termination, Nab3 associates with Nrd1, another hnRNP-like protein that also contains a conserved RNA recognition motif (RRM) (3). One model of transcription termination suggests that this complex associates with the repeated carboxyl-terminal domain of the large subunit of RNA polymerase II through the Nrd1 subunit (4). As well, Nab3 and Nrd1 recognize specific RNA sequences in target RNA enabling the proteins to associate directly with RNA (4). The number and arrangement of such sites vary from substrate to substrate. Thus, nascent RNA destined for termination by this mechanism accumulates a coating of Nrd1 and Nab3 proteins during RNA chain elongation, marking it for the specific mode of termination that characterizes this class of RNA. The process also involves the putative RNA helicase Sen1 and the TRAMP complex, both of which interact with the Nrd1/Nab3 proteins to activate termination and RNA processing (2).
Whereas Nrd1 appears to be the yeast orthologue of human hnRNPs SCAF8 and SCAF4 (3)(4)(5) and it contains a CTD-interacting domain and an RRM, Nab3 is less obviously related to a particular mammalian hnRNP protein based upon sequence homology. Like a number of other hnRNPs, it contains amino acid repeats and tracts of homopolymeric residues including aspartic acid/glutamic acid and proline/glutamine rich domains that have not yet been assigned function. Polyglutamine stretches are of biomedical interest because a variety of proteins suffer mutations that lengthen the homopolymeric tract resulting in multiple human diseases (6). Whereas long tracts are associated with protein aggregation, wild type runs of polyglutamine in proteins are important for protein function where they can augment protein-protein interactions depending upon their length and the sequences adjacent to them (6,7). In particular, polyglutamine regions can stabilize the co-assembly of adjacent coiled-coil structures (8,9).
Little is known of the function of most of the essential Nab3 polypeptide outside of the RRM and Nrd1-interaction domains, including the numerous runs of consecutive glutamines and other amino acids. Furthermore, the protein is difficult to express in recombinant form to study its function as a specialized ribonucleoprotein complex (10).
Recently, we revealed a role for a small segment of the carboxyl-terminal "tail" of Nab3 in transcription termination at the CUT associated with IMD2 regulation (1,11). From spontaneous mutations in yeast, we learned that the final 19 amino acids at the Nab3 carboxyl terminus are important for function. Interestingly, deletion of the last three amino acids recapitulated the phenotype observed when the entire tail was lost, and even deletion of only the last amino acid of Nab3 resulted in significant loss of function (11). Furthermore, this region shares structural homology with an ␣-helical segment of mammalian hnRNP-C1 (and hnRNP-C2) that enables it to tetramerize (12). Curiously, Nab3 possesses a tract of 16 glutamines of unknown function adjacent to this hnRNP-C-like tail. Polyglutamine tracts in proteins, and other simple amino acid repeats, have been implicated in the function of adjacent secondary structural elements and protein-protein interaction domains (9).
Here we show that the yeast Nab3 tail domain is sufficient to enable assembly of recombinant fusion proteins into higher order multimers, much as the hnRNP-C region does. The stretch of glutamines adjacent to the terminal ␣-helix facilitates its role as a multimerization domain and incapacitates the role of Nab3 in termination when lost. These results provide evidence in favor of a model that suggests that deposition of multiple Nrd1-Nab3 heterodimers onto RNA substrates is important for the packaging of substrate RNAs and for the optimal functioning of these yeast hnRNP-like proteins in transcription termination and RNA degradation/processing.

MATERIALS AND METHODS
Plasmid Construction, Protein Expression, and Purification-Using oligonucleotides or PCR, portions of the carboxyl-terminal sequences of Nab3 were added to the carboxyl terminus of thioredoxin for expression in Escherichia coli from the pET32a vector. The oligonucleotide pair 5Ј-gatccggcgggcaacaacgtgcagagcctgctggatagcctggcgaaactgcagaaataac-3Ј and 5Ј-tcgagttatttctgcagtttcgccaggctatccagcaggctctgcacgttgttgcccgccg-3Ј encoding the final 18 amino acids of Nab3 were annealed, phosphorylated, and ligated into the XhoI and BglII sites of pET32a to generate pET32a18aa. A PCR product encoding the final 40 amino acids was generated from the plasmid pRS315Nab3FL (11) using oligonucleotides 5Ј-atatagatctacaccaacctccgcc-3Ј and 5Ј-atatctcgagctatttttgtagttttgctaaac-3Ј, digested with BglII and XhoI, and inserted into similarly cut pET32a to yield pET32a40aa. pET32a-tov1 and pET32a-tov4 were generated by inserting into pET32a XhoI-and BglII-digested PCR products generated from genomic DNA from tov1 and tov4 yeast strains (11), respectively, using the oligonucleotides 5Ј-atatagatctacaccaacctccgcc-3Ј and 5Ј-atatctcgagctatttttgtagttttgctaaac-3Ј.
Plasmids were introduced into BL21(DE3) E. coli, and protein expression was induced with isopropylthiogalactoside for 3 h at 37°C. Lysozyme-treated cells were broken by sonication, and the lysate was clarified by centrifugation for 30 min at 27,000 ϫ g. The resulting His 6 -tagged thioredoxin fusion proteins were bound to immobilized nickel at 10 mM imidazole and eluted at 250 mM. They were then dialyzed into 20 mM Tris, pH 7.5, 50 mM NaCl, 1 mM EDTA and chromatographed onto QAE-Sephadex. Flow-through protein was collected and concentrated by centrifugal filtration as needed. Where indicated, the Nab3 polypeptides were excised from the carboxyl terminus of the thioredoxin fusion protein using thrombin. A second round of nickel chromatography was used to separate the His 6tagged thioredoxin from the free Nab3 polypeptide.
pRS315-Nab3 was the starting plasmid from which the Gln 769 to Gln 783 polyglutamine stretch was deleted by in vitro replication with Phusion DNA polymerase (New England Biolabs) and mutagenic oligonucleotides 5Ј-cctgctggcaataatgttcaaagtctatta-3Ј and 5Ј-aggtggcggaggttggtgtgac-3Ј. Replicated DNA was used to transform E. coli, and the loss of the region encoding the glutamine stretch in the new plasmid (pRS315-Nab3-NoQ) was confirmed by DNA sequencing.
The strains used in this study are presented in Table 1. The LEU2-marked plasmids pRS315-Nab3-L800A, pRS315-Nab3-Q801A, and pRS315-Nab3-NoQ were introduced independently into a ⌬nab3 strain that also contained the plasmid pRS316-nab3-11 (DY30229 (11)). Following growth on SC ura Ϫ leu Ϫ plates, colonies were grown at 30º on SC leu Ϫ and tested for loss of the URA3-bearing plasmid by growth on 5-fluoroorotic acid. Following this plasmid shuffle, yeast were then transformed with the GAL-terminator-GFP reporter plasmid described previously (11) and were named DY3126, DY3127, and DY3129. The resulting yeast strains, and a control strain expressing wild type NAB3 (DY3036 (11)), were assayed at an excitation wavelength of 488 nm on a LSRII (BD Bioscience) flow cytometer as described (11). The lithium acetate method of transformation was used throughout (13).
Protein Cross-linking-Cross-linking reactions were prepared as described by Whitson et al. (12). Briefly, purified recombinant proteins (0.5-1 g/l) were incubated for 30 min at 22°C with freshly dissolved 5 mM (unless otherwise indicated) bis(sulfosuccinimidyl)suberate (BS 3 ; Thermo Scientific) in 20 mM Tris, pH 7.5, 50 mM NaCl, 1 mM EDTA. Reactions were terminated by adjustment to 50 mM glycine. Proteins were precipitated with 10% (v/v) trichloroacetic acid and resolved on SDS gels composed of a resolving slab of 9.5% polyacrylamide layered upon an 11% gel to better separate monomeric from polymeric species. Where indicated, 16% polyacrylamide Tris-Tricine gels were used to resolve small proteins as described by Schagger (14). Gels were stained with fresh Coomassie Brilliant Blue (0.2% w/v).

RESULTS
Using chemical cross-linking, a 27-residue ␣-helix from hnRNP-C was shown to self-assemble into a tetrameric bundle when appended to a heterologous protein (E. coli thioredoxin (12)). Because the carboxyl-terminal tail of Nab3 is structurally related to the hnRNP-C self-assembling helix (11), we employed this assay to test whether the cognate region of the yeast protein could similarly polymerize. A family of recombinant fusion proteins in which various portions of the Nab3 carboxyl terminus were fused to thioredoxin was expressed in bacteria and purified. The homobifunctional reagent BS 3 was used to cross-link the purified proteins, and the resulting multimers were separated by SDS-PAGE. The 18-amino acid terminus of Nab3 (residues 785-802; Fig. 1, Nab3-18aa) was able to facilitate intermolecular cross-linking when attached to thioredoxin ( Fig. 2A, Nab3-18). More effective cross-linking was observed when the carboxyl-terminal 40 amino acids of Nab3 (Fig. 1, Nab3-40aa) were added to the carboxyl terminus of thioredoxin ( Fig. 2A, Nab3-40). This construct included a stretch of 16 glutamines immediately adjacent to the 18-amino acid tail that was identified as important by mutation (11). The pattern of higher order multimers resembled what was seen for the cross-linking of a thioredoxin fusion protein to which the human hnRNP-C assembly helix was attached ( Fig. 2A, hnRNP-C-27), as described by Whitson et al. (12). These crosslinking data are evidence that the Nab3 tail is functionally similar to the hnRNP-C assembly domain and that it is likely to possess ␣-helical character. The similarity in disposition of hydrophobic and hydrophilic residues seen when the primary sequence of yeast Nab3 and human hnRNP-C were displayed in a helical wheel representation is also consistent with this interpretation (Fig. 1).
The structure of hnRNP-C self-assembly domain and the helical wheel modeling of the related Nab3 region suggest that this protein-protein interaction results from a packing of hydrophobic faces of the helices into a bundle. This implies the interaction would be relatively insensitive to high ionic strength. To test this, we repeated the cross-linking assay in 0.5 M and 1 M NaCl. The thioredoxin-Nab3 40 amino acid fusion protein could be readily cross-linked at these high ionic strengths (data not shown).
A spontaneous mutation in Nab3 (tov1) changed the final glutamine in the 16-residue repeat to a stop codon, resulting in a Nab3 protein that lacks this residue and the carboxyl-terminal 18 amino acid tail (Fig. 1, Nab3-tov1) (11). To test whether this derivative of Nab3 could self-associate, we generated a thioredoxin fusion derivative containing only the 21 amino acids (763-783) of Nab3 on the amino-terminal side of the stop  Derivatives of the Nab3 terminus previously identified by mutation are shown below the alignment and include the part of this region remaining in the tov1 and tov4 mutants. The peptides corresponding to these yeast sequences were appended to thioredoxin for analysis of self-association. Helical wheel representations for the primary sequences of yeast Nab3 and human hnRNP-C were generated using the Helical Wheel Applet.
codon. In other words, we tested a fusion protein with the last 40 amino acids of Nab3 from which the 18-amino acid tail was removed, leaving 15 glutamines at the fusion protein carboxyl terminus. This protein also generated multimers (Fig. 2B, tov1). Similarly, we recapitulated a second tov mutant, tov4, which had a nonsense mutation within the glutamine repeat tract. The construct had a thioredoxin-Nab3 fusion polypeptide that extends only 7 residues into the glutamine repeat (residues 763-775; Fig. 1, Nab3-tov4). This derivative also assembled into higher order species as determined by chemical cross-linking (Fig. 2B, tov4). For the two tov mutant derivatives of Nab3, as well as the complete 40-amino acid tail, a small amount of the dimeric form was detectable in the absence of chemical crosslinking reagent (Fig. 2B, Ϫ lanes). More of this denaturationresistant form was observed for the 40-amino acid piece of Nab3 versus the 13-or 21-amino acid segments representing tov mutants. We conclude that both the terminal 18 amino acids and the adjacent 16 glutamines of Nab3 contribute independently to the ability to self-associate. Together they provide a multimerization interface such as that seen for hnRNP-C. To confirm that multimerization was due to the Nab3 portion of the fusion proteins, the 18-and 40-amino acid portions of Nab3 were removed from thioredoxin by digestion with thrombin at a site engineered into the expression plasmid. Free thioredoxin was separated from the cleaved Nab3 tails by metal affinity chromatography, and the small isolated tails were cross-linked with BS 3 and analyzed by Tris-Tricine PAGE. Both the 18-and 40-amino acid pieces of Nab3 formed higher order polypeptides following cross-linking, with the larger 40-amino acid piece doing so more robustly (Fig. 3). Interestingly, multimers of the 40-amino acid piece were apparent in the absence of cross-linking reagent, similar to what has been seen for the 27-amino acid ␣-helix from human hnRNP-C and other polyglutamine-containing proteins (12,15). This independently confirmed that the carboxyl-terminal region of Nab3 can selfassociate and that the glutamine tract augments this property.
As an additional test of the ability of the Nab3 tail to selfassemble, we subjected the thioredoxin-Nab3 40-amino acid fusion protein to gel filtration on Superdex 200. Two protein peaks were resolved in migration positions consistent with the sizes of monomer and tetramer based on the positions of reference size markers (data not shown). When the monomeric peak was pooled and gel-filtered a second time, it again was separable as two peaks, consistent with an equilibrium between monomeric and assembled versions of the fusion protein (Fig. 4).
We next tested whether yeast could grow following removal of the carboxyl-terminal 40 amino acids of Nab3 that contains the glutamine stretch and the hnRNP-C homology region (Fig.  1). Through plasmid shuffling, we obtained a strain of yeast containing such an allele of Nab3 (nab3⌬40) on a plasmid covering a deletion of chromosomal NAB3. This strain was viable but grew slowly, as observed previously for the spontaneous mutant of Nab3 (tov1) that lacked its carboxyl-terminal 19 res-  . Cross-linking of Nab3 tail peptides. The Nab3 18-or 40-amino acid carboxyl-terminal peptide expressed in E. coli was cleaved from its thioredoxin tag by cleavage with thrombin. The released thioredoxin was removed by nickel affinity chromatography, and it (right) or the free tail Nab3 peptides (left) were cross-linked with the indicated concentration of BS 3 . Polypeptides were separated on a 16% polyacrylamide Tris-Tricine gel and stained. idues (Fig. 5A, nab3⌬40 and nab3⌬19). Similarly, we generated a strain containing mutated NAB3 in which only the codons for the 16-glutamine tract were deleted in the context of an otherwise wild type gene. This strain grew as well as wild type (Fig.  5A, nab3⌬Q (16) ). A more extensive removal of the carboxylterminal 134 amino acids of Nab3 proved to be lethal (data not shown), indicating another important function for this protein exists slightly upstream in the protein.
Loss of the 18-amino acid tail from Nab3 negatively impacts its termination activity (11). To test whether the glutamines adjacent to this domain were important for the biological function of Nab3, we returned to a flow cytometry assay previously used to evaluate termination efficiency in living cells (11). In this assay, the inducible GAL1 promoter is placed upstream of the green fluorescent protein (GFP) reading frame and introduced into yeast on a plasmid. Galactose-treated cells are strongly fluorescent unless the IMD2 terminator is inserted between the promoter and GFP. Mutations in the terminator sequence, or in genes encoding trans-acting factors involved in termination (e.g. NAB3, SSU72, or PCF11), restore fluorescence by abrogating terminator function (11).
We designed a plasmid-borne version of NAB3 deleted for the sequence encoding the 16 glutamines adjacent to its hnRNP-C-like ␣-helix. This plasmid was introduced into a yeast strain lacking chromosomal NAB3, thereby making it the sole source of this protein. To complete the strain, we also transformed cells with the GAL1 promoter-IMD2 terminator-GFP reporter plasmid. If the glutamine stretch is important for termination, cells should show increased fluorescence as gauged by flow cytometry compared with a strain with wild type NAB3 (11). Indeed, when grown on galactose, loss of the 16 glutamines resulted in readily detectable terminator readthrough (Fig. 5B, green) compared with cells expressing wild type Nab3 (red). (A perfectly functional terminator results in a population of cells that show a low level of autofluorescence with a peak centered at Ϸ10 2 .) Deletion of the last three amino acids of Nab3 compromised termination as scored by this flow cytometry technique (11).
When the penultimate residue (Gln 801 ; Fig. 1) was mutated to alanine (blue), the Nab3 termination function was compromised to an extent similar to the loss of the glutamine tract. When the residue before it (Leu 800 ) was mutated to alanine, termination function was severely compromised resulting in GFP intensities as strong as that observed from loss of the carboxyl-terminal three amino acids ((11) Fig. 5B, orange). These data underscore the importance of the very carboxyl-terminal sequence of Nab3 and the polyglutamine stretch. The difference in severity between the alanine substitutions in residues 800 versus 801 may be related to structural modeling, which suggested that Leu 800 ends an ␣-helix whereas Gln 801 is not in that helix (11).

DISCUSSION
Mutations in Nab3 that impaired its termination function have focused attention on the extreme carboxyl terminus of the protein (11). This domain has structural homology with an ␣-helical region of hnRNP-C that was shown to enable this mammalian protein to tetramerize in its role as an RNA packaging protein (12). Adjacent to that is an unusually long stretch of glutamines. Here we show that this region of Nab3 is functionally similar to the cognate region of human hnRNP-C in that both self-assemble into tetramers. This strengthens the case that Nab3 is a yeast orthologue of mammalian hnRNP-C, a suggestion that initially arose due to homology between the RRM domains of the two proteins (16). A Nrd1 binding region near the Nab3 RRM has also been described (10), but other functions of this large yeast protein have remained mysterious. Study of full-length recombinant protein has proven difficult due to its unusual sequence composition which seems responsible for its extreme lability (data not shown (10)).
Although hnRNP-C1 and -C2 (293 and 306 residues, respectively) and Nab3 (802 residues) share similar RNA binding and tetramerization domains, the positioning of these motifs in the respective primary sequences differs between the proteins, and the surrounding sequences are not closely related. Thus, although the RRM and assembly modules are conserved between the yeast and human proteins, the overall protein structure and function are more divergent. Whereas both polypeptides serve to size-select newly made RNAs, they do so in opposite ways. hnRNP-C binds and measures relatively long RNAs for processing and export (17) whereas Nab3 preferentially facilitates termination of short RNAs in yeast (18). hnRNP-C is highly abundant (Ϸ10 6 /cell (19)), whereas the level of Nab3 is more modest (Ͻ6,000 copies/cell (20)). Genetic probing revealed that the self-assembly region of Nab3 is important for the biological function of the protein (11). The significance of the cognate region in hnRNP-C was shown biochemically, due in part to the ability to reconstitute RNA-protein particles with recombinant full-length hnRNP-C (21). hnRNP-C does not have a polyglutamine stretch and is toxic when expressed in S. cerevisiae (22).
Low complexity protein sequences, including homopolymeric runs of certain amino acids, are distributed nonrandomly in proteins. RNA-binding proteins are overrepresented in this group (23). The sequences tend to be involved in protein-protein interactions resulting in aggregation or assembly of dynamic polymers including hydrogels. This property has been suggested to be functionally important for the reversible formation of cytoplasmic granules that harbor RNA-binding proteins (23).
Nab3 contains numerous regions of low complexity sequence across its length, including 12 consecutive aspartates and 10 consecutive glutamates that are embedded in a stretch of 30 residues that are one or the other of these residues. Nab3 also has numerous uninterrupted glutamine runs including three stretches of 6 and one of 16. Our results suggest a role for the latter, which is the longest of the Nab3 homopolymeric tracts and which is immediately adjacent to the hnRNP-C self-assembly region important for transcription termination. This glutamine stretch augments the Nab3 protein-protein interaction, as has been seen elsewhere for polyglutamine tracts (9), and is consistent with suggestions that polyglutamine tracts form an extended helix when placed next to sequences with an ␣-helical character (9). Whether this is the case here will require more detailed structural analysis. Polyglutamine stretches are seen in human diseases in which the repeat expands by mutation. In general, the repeat expansions in humans are not sufficient for pathology. Instead, it seems that the gain of function in such mutant proteins results from enhancement of its native function rather than acquisition of a new role (6). This report provides some insight into a role for the natural function of a 16-glutamine stretch in a yeast protein whose structure is not well understood. We also learned there is a region neighboring the low complexity hnRNP-C-like multimerization tail that is essential for cell viability. Thus, there is another important domain in this unexplored region of Nab3. Whether its role is related to that of the extreme carboxyl-terminal tail remains to be elucidated.
These findings suggest an extension of current models positing that the Nab3/Nrd1 termination machinery forms a net-work with nascent RNA and the RNA polymerase II CTD (10,24). Based upon the roles of the Nab3 glutamine tracts in both termination and protein-protein interaction, this nuclear complex could resemble the higher order assemblage seen in cytoplasmic RNA-binding proteins that contain low complexity primary amino acid sequences (23,25). Self-association of Nab3 subunits provides an additional tether between the Nab3-Nrd1 heterodimers as they form an array on RNA substrates in which they recognize multiple consensus sequence motifs (10). The number and positioning of Nab3-Nrd1 heterodimers attracted to a single RNA could be governed, not only by the quantity of consensus sites on the RNA, but also through stabilization by the additional association of multiple Nrd1 subunits with the repeats in the CTD of RNA polymerase II (10). Reversible assembly may regulate the ability of the Nrd1-Nab3 complex to terminate transcription and guide substrate RNAs through their respective processing or degradative pathways. Determination of the structure and stoichiometry of such a hypothetical (Nab3-Nrd1) n -RNA complex likely depends upon the specific RNA substrate. Future analysis of these interactions will be aided by in vitro reconstitution from purified RNA and protein components.