Divergence of Pumilio/fem-3 mRNA Binding Factor (PUF) Protein Specificity through Variations in an RNA-binding Pocket*

Background: PUF protein RNA recognition is critical for target gene regulation. Results: A chemically conserved binding pocket in a subset of PUF proteins recognizes cytosine at different positions upstream of the core PUF recognition sequence. Conclusion: A specialized cytosine-binding pocket introduces qualitative and quantitative differences in RNA recognition by PUF proteins. Significance: Simple adaptations can diversify PUF protein RNA recognition. mRNA control networks depend on recognition of specific RNA sequences. Pumilio-fem-3 mRNA binding factor (PUF) RNA-binding proteins achieve that specificity through variations on a conserved scaffold. Saccharomyces cerevisiae Puf3p achieves specificity through an additional binding pocket for a cytosine base upstream of the core RNA recognition site. Here we demonstrate that this chemically simple adaptation is prevalent and contributes to the diversity of RNA specificities among PUF proteins. Bioinformatics analysis shows that mRNAs associated with Caenorhabditis elegans fem-3 mRNA binding factor (FBF)-2 in vivo contain an upstream cytosine required for biological regulation. Crystal structures of FBF-2 and C. elegans PUF-6 reveal binding pockets structurally similar to that of Puf3p, whereas sequence alignments predict a pocket in PUF-11. For Puf3p, FBF-2, PUF-6, and PUF-11, the upstream pockets and a cytosine are required for maximal binding to RNA, but the quantitative impact on binding affinity varies. Furthermore, the position of the upstream cytosine relative to the core PUF recognition site can differ, which in the case of FBF-2 originally masked the identification of this consensus sequence feature. Importantly, other PUF proteins lack the pocket and so do not discriminate upstream bases. A structure-based alignment reveals that these proteins lack key residues that would contact the cytosine, and in some instances, they also present amino acid side chains that interfere with binding. Loss of the pocket requires only substitution of one serine, as appears to have occurred during the evolution of certain fungal species.

mRNA control networks depend on recognition of specific RNA sequences. Pumilio-fem-3 mRNA binding factor (PUF) RNA-binding proteins achieve that specificity through variations on a conserved scaffold. Saccharomyces cerevisiae Puf3p achieves specificity through an additional binding pocket for a cytosine base upstream of the core RNA recognition site. Here we demonstrate that this chemically simple adaptation is prevalent and contributes to the diversity of RNA specificities among PUF proteins. Bioinformatics analysis shows that mRNAs associated with Caenorhabditis elegans fem-3 mRNA binding factor (FBF)-2 in vivo contain an upstream cytosine required for biological regulation. Crystal structures of FBF-2 and C. elegans PUF-6 reveal binding pockets structurally similar to that of Puf3p, whereas sequence alignments predict a pocket in PUF-11. For Puf3p, FBF-2, PUF-6, and PUF-11, the upstream pockets and a cytosine are required for maximal binding to RNA, but the quantitative impact on binding affinity varies. Furthermore, the position of the upstream cytosine relative to the core PUF recognition site can differ, which in the case of FBF-2 originally masked the identification of this consensus sequence feature. Importantly, other PUF proteins lack the pocket and so do not discriminate upstream bases. A structure-based alignment reveals that these proteins lack key residues that would contact the cytosine, and in some instances, they also present amino acid side chains that interfere with binding. Loss of the pocket requires only substitution of one serine, as appears to have occurred during the evolution of certain fungal species.
mRNA control is pervasive. Translation, stability, and localization of many mRNAs are governed by elements in their 3Ј untranslated regions (3ЈUTRs) 3 (1). The specificity of interactions between 3ЈUTRs and regulatory proteins underlie networks of control, enabling coordinate regulation of functionally related mRNAs (2)(3)(4). The RNA sequences recognized are often single-stranded, requiring discrimination of a specific series of nucleotides rather than folded structures.
The PUF family of proteins regulates mRNAs using a common polypeptide scaffold. They comprise a series of ␣-helical repeats arranged along an arc ( Fig. 1) (5,6). A ladder of ␣-helices, the so-called RNA recognition helices, lie on the concave face of the protein (7). Each helix contacts predominantly one base, using two amino acid side chains to make edge-on contacts and another to stack between adjacent bases (Fig. 1). The simplest condition, exemplified by human Pumilio 1, uses eight helices to recognize eight RNA bases (7).
Variations of the PUF scaffold enable different PUF proteins to discriminate unique groups of mRNAs, even though the proteins use very similar sets of atomic contacts. In FBF-2, for example, a distortion in the central region of the protein requires the presence of an "extra" base relative to Pumilio (8).
Yeast Puf4p also demands an extra base but at a different location and with distinct structural changes (4,9). In these cases, the key base does not contact the protein but is solvent-exposed, or "flipped." FBF-1 and FBF-2 are two nearly identical proteins with highly overlapping functions, collectively referred to as FBF. The core RNA-binding site for FBF, termed the FBF binding element (FBE), is the 9-mer sequence 5Ј-UGUDHHAUA-3Ј, where H is A, U, or C, and D is A, G, or U (10). The FBE, as defined in this report, represents the highest affinity sites. It was elucidated using either purified FBF protein in vitro or through yeast three-hybrid assays (10). In vivo, natural FBF binding sites generally conform to this in vitro FBE but include variations with suboptimal affinity.
PUF protein core RNA-binding sites are recognized by RNA recognition helices R1 to R8 (Fig. 1). PUF proteins contain a C-terminal ␣-helical region, or pseudo-repeat, that contributes an additional helix to the RNA-binding surface. In yeast Puf3p, that pseudo-repeat (called R8Ј) combines with parts of repeat 8 to form a pocket that binds a cytosine residue two positions 5Ј of the core RNA-binding site (11). The presence of a C at that Ϫ2 position is required for tight binding to target mRNAs in vitro and for regulation in vivo (11). Most mRNAs associated with Puf3p in vivo possess a C at this position and encode proteins with mitochondrial functions (4). This extra binding pocket can be viewed as a specificity device that enables Puf3p to bind only its own targets (11).
We sought to understand the appearance and loss of upstream C-binding pockets among PUF proteins at the structural level. Analysis of the RNAs associated with FBF-2 in vivo revealed that they contain an upstream C residue critical for binding. Structural analysis of FBF-2 revealed that it possesses a binding pocket chemically similar to that of yeast Puf3p. Closely related pockets were identified in Caenorhabditis elegans PUF-6 and PUF-11, and in each case, the pocket enhanced binding to RNA, although with a unique quantitative impact. In contrast, other PUF proteins lack the pocket and do not discriminate upstream bases. Structure-based sequence alignments of proteins with and without upstream pockets reveal the diagnostic features of the pocket with a critical serine residue that directly contacts the upstream C. The simplicity of the variations required to gain or lose this specificity suggest an opportunity for rapid evolution of new networks of control.

EXPERIMENTAL PROCEDURES
Protein Expression and Purification-The RNA-binding domain of FBF-2 (residues Ser-164-Glu-575) was expressed and purified, and protein-RNA complexes were prepared as reported previously (8).
We also expressed PUF-6 with an N-terminal His 6 -small ubiquitin-like modifier (SUMO) tag (12). Protein expression was induced at 18°C for 20 h. Cell pellets were lysed in sonication buffer containing 20 mM Tris (pH 8.0), 0.5 M NaCl, 20 mM imidazole, 5% (v/v) glycerol, and 0.1% (v/v) ␤-mercaptoethanol. The fusion protein was purified using a nickel-affinity column, and the His 6 -SUMO tag was removed by Ulp-1 protease cleavage. PUF-6 was further purified with a heparin column and a Superdex 200 column as described above. The final yield from the His 6 -SUMO-tagged PUF-6 was about 0.5 mg/liter culture, ϳ10-fold greater than from the GST fusion.
Site-directed mutants, generated using a QuikChange sitedirected mutagenesis kit (Agilent), were purified following the same protocols as for wild-type proteins.
Crystals of PUF-6 with 5BEa13 RNA were grown in hanging drops at 20°C with crystallization solution (15% (w/v) PEG 3350, 0.1 M succinic acid (pH 7.0)). The crystals were flashfrozen in crystallization solution with 15% (v/v) glycerol. Diffraction data were collected at the Southeast Regional Collaborative Access Team beamline 22-ID (wavelength 1.0 Å) at the Advanced Photon Source, Argonne National Laboratories. Data were indexed and scaled with HKL2000 (13). Data collection and processing statistics are shown in supplemental Table 1.
Structure Determination and Refinement-The crystal structure of the FBF-2-gld-1 FBEa13 complex was determined by molecular replacement using a structure of FBF-2 (PDB code 3K5Y) as the search model. The RNA was excluded from the initial searching and phase calculations, and high-temperature simulated annealing (2500 K) was performed to reduce model bias. The model was refined with CNS (14) and rebuilt manually using O (15). Phenix was employed for addition of water molecules and translation/libration/screw refinement (16). The final FBF-2-gld-1 FBEa13 structure comprises residues Leu-168 to Ser-567 and nucleotides Ϫ2C to ϩ9A. The structure of FBF-2 with Ϫ1C FBEa13 RNA was determined using the structure of FBF-2-gld-1 FBEa containing nucleotides ϩ1U to ϩ9A as the model and refined as for the complex with gld-1 FBEa13.
The structure of the PUF-6 -5BE13 complex was determined by molecular replacement using the protein structure of FBF-2 (3K5Y) as a search model in Phaser (17). Coot (18) and Phenix (16) were used for model building and refinement, respectively. The final model contains PUF-6 residues 82-403 and 409 -452 and nucleotides Ϫ1C to ϩ6C. All structures were analyzed with MolProbity (19). All torsion angles are within allowed regions of the Ramachandran plot, and Ն 97% are in the most favored regions. Refinement statistics are presented in supplemental Table 1.
Electrophoretic Mobility Shift Assays-Equilibrium dissociation constants were determined as reported previously (8). Briefly, radiolabeled RNA oligonucleotides (100 pM for FBF-2 and PUF-6, 10 pM for PUF-11) and serially diluted protein were incubated in a buffer containing 10 mM HEPES (pH 7.4), 50 mM NaCl, 0.1 mg/ml BSA, 0.01% (v/v) Tween 20, 0.1 mg/ml yeast tRNA, 1 mM EDTA, and 1 mM DTT. After addition of Ficoll 400 to 2.5% (v/v), reaction mixtures were run on 10% polyacrylamide gels. The apparent dissociation constants, mean, and S.E. were calculated using GraphPad Prism (Graphpad LLC) by fitting data from at least three independent experiments using non-linear regression with a one-site, specific binding model. The percentage of active protein was determined as described previously (20). Wild-type and mutant FBF-2 and PUF-6 were ϳ93% active. Wild-type PUF-11 was ϳ98 and ϳ99% active in different assays, and PUF-11 S491A was ϳ89% active. Corresponding adjustments were made to K d values in supplemental  Tables 2-4.
Computational Analyses-C. elegans 3ЈUTR sequences (WS190) were downloaded from BioMart and analyzed as described in the text.

FBF-2 Target mRNAs in vivo
Possess an Upstream Ϫ1 or Ϫ2C-To define sequences responsible for FBF binding to its target mRNAs in vivo, we identified mRNAs selected from those associated with FBF in C. elegans extracts (21). In that study, FBF transgenes had been created in C. elegans and were used to perform immunoprecipitation (IP) of FBF followed by RNA microarray analysis (RIP chip). These studies identified 1350 putative FBF target mRNAs, defined by their enrichment in the FBF IP compared with the negative control IP.
To identify possible FBF binding sites, we first performed an unbiased search for RNA sequence motifs enriched in the 3ЈUTRs of the 200 mRNAs showing the highest enrichment in the FBF IP versus control IP. The de novo motif-finding tool Cosmo (22) revealed a motif containing the FBE that also suggested a preference for additional nucleotides flanking the core FBE, including an upstream C at positions Ϫ1 and Ϫ2 ( Fig. 2A, see "Experimental Procedures"). To examine the Ϫ1 and Ϫ2 positions in greater depth, we identified high-confidence FBF binding sites as outlined in Fig. 2B. We focused on mRNAs for the 500 probe sets that were most enriched in the FBF RIP chip study. This group contained several previously validated FBF targets. In addition, this group contained a significant enrichment (p Ͻ 10 Ϫ90 ) of FBE-bearing transcripts compared with all genes. 82% of these mRNAs contained an FBE in the mRNA 3ЈUTR compared with only 29% for all C. elegans mRNAs (21). Among these 500, we considered further only the 338 unique, unambiguous mRNAs with annotated 3ЈUTRs. We eliminated mRNAs that possessed more than one putative FBE, as we could not discern which elements were functional in vivo. This scheme yielded 149 likely FBF targets with single FBEs in their 3ЈUTRs.
The majority (82%) of high-confidence FBF binding sites contained a C at either the Ϫ1 or Ϫ2 position or both (Fig. 2C). 73% contained a single C, whereas 9% contained a C at both positions. We conclude that high-confidence in vivo FBF binding sites are enriched for a C at either position Ϫ1 or Ϫ2 upstream of the core PUF binding site. Similarly, mRNAs encoding synaptonemal complex proteins, shown independently to be FBF targets, contain a Ϫ2C (23).
To discern any additional sequence patterns, we used Cosmo (22) to determine motifs for four groups of FBEs, those with C only at Ϫ2 (CD), those with C only at Ϫ1 (DC), those with C at both Ϫ1 and Ϫ2 (CC), and those with no C at Ϫ1 or Ϫ2 (DD). The data yielded several patterns (Fig. 2C). All motifs contained a UGU at positions ϩ1 to ϩ3 and an AU dinucleotide at positions ϩ7 and ϩ8, matching the consensus FBE sequence. Of the 63 FBEs with a Ϫ2C, 24 have Ϫ1A, 21 have Ϫ1U, 14 have Ϫ1C, and 4 have Ϫ1G, indicating that any base may be acceptable, but a preference is observed for Ϫ1A, U, or C. RNAs without a Ϫ1 or Ϫ2C were enriched for a C at position ϩ6 and an A at position ϩ9, as compared with all other FBEs. These observations suggest that the presence of a C at either Ϫ1 or Ϫ2 is conserved among many FBF target sites. Earlier work that analyzed naturally occurring mutations in fem-3 mRNA noted that a C at what we now understand is the Ϫ2 position was required for regulation in vivo (24). In the absence of an upstream C, nucleotides at other positions may compensate to increase affinity, as RNAs with ϩ6C or ϩ9A bind more tightly to FBF-2 than RNAs with other bases at these positions (8).
An Upstream C Is Required for Tight Binding by FBF-To evaluate the functional significance of an upstream C at the Ϫ1 or Ϫ2 position, we determined the in vitro binding affinity of FBF-2 for target sequences in the gld-1 3ЈUTR. The 3ЈUTR of the gld-1 mRNA contains a well defined FBF binding site with a Ϫ2C, termed FBEa (5Ј-UCAUGUGCCAUAC-3Ј, Ϫ2C is shown in boldface). We measured the binding affinity to 9-mer (ϩ1 to ϩ9) and 13-mer (-3 to ϩ10) RNAs derived from the gld-1 FBEa using electrophoretic mobility shift assays ( Fig. 3 and supplemental Table 2). The 9-mer RNA (FBEa9) represents the conserved core FBF binding sequence beginning with a 5ЈUGU (underlined above), and the 13-mer RNAs (FBEa13) contain three additional nucleotides upstream and one additional nucleotide downstream of the 9-mer core sequences. FBF-2 binds the FBEa13 RNA (WT UCA) ϳ9-fold more tightly than the FBEa9 RNA (K d 3 nM versus 28 nM).
To determine whether this increased binding affinity was due to the presence of the upstream Ϫ2C or merely stronger binding to a longer RNA, we tested binding to 13-mer RNAs with mutated sequences in the upstream region. Changing C to A at position Ϫ2 (upstream UCA to UAA) decreased affinity ϳ19-fold (Fig. 3B). However, binding affinity was restored by insertion of a C at position Ϫ1 (upstream UAA to UAC), indicating that a C at either Ϫ1 or Ϫ2 was sufficient for higher affinity binding and that this higher affinity was not simply due to a longer RNA. A C at position Ϫ3 (upstream UAA to CAA) resulted in only a modest increase in affinity (K d of 43 versus 57 nM). Other RNAs lacking a Ϫ1 or Ϫ2C with three consecutive purines (upstream AAA or GAG) bound ϳ22-fold more weakly than the wild-type RNA. Similarly, an RNA with a Ϫ2U (upstream UUA) bound ϳ13-fold more weakly than the wildtype RNA. The RNAs lacking a Ϫ1 or Ϫ2C in the upstream sequence bound more weakly than the 9-mer RNA without an upstream sequence, suggesting that the presence of non-cognate upstream bases interfere with binding to FBF-2.
We also measured binding of FBF-2 to 13-mer natural target sequences in fem-3, fog-1, and a second site in gld-1 (FBEb).  (21) were analyzed using MEME. Shown is part of a motif identified using MEME that contains an FBE. B, scheme for identifying high-confidence FBF binding sites. The highest-confidence FBF binding sites likely correspond to FBEs in putative FBF target 3ЈUTRs with exactly one consensus FBE. The 500 highest scoring probe set in the FBF RIP chip corresponded to 317 unique mRNAs with annotated 3ЈUTRs. Of these, 149 had only one FBE. C, proportion of high confidence FBEs with Cs at position Ϫ1 and/or Ϫ2. These sequences differ from the gld-1 FBEa in upstream sequence as well as other varied positions in the core FBE. All sequences bound with similar affinity as gld-1 FBEa13 (supplemental Table 2). Together these binding data suggest that in different contexts a Ϫ1 or Ϫ2C promotes tighter binding of FBF-2.
Structure of the FBF-2 Upstream C-binding Pocket-To understand the molecular basis for the interaction of an upstream C with FBF-2, we determined the crystal structures of FBF-2 in complex with a wild-type gld-1 FBEa13 (5Ј-UCA-UGUGCCAUAC-3Ј) possessing a Ϫ2C or a mutant FBEa13 (5Ј-UACUGUGCCAUAC-3Ј) with a Ϫ1C. The protein structures in these two complexes and in complex with the 9-mer gld-1 FBEa9 (PDB code 3K5Y) are unchanged (root mean square deviation of 0.3-0.7 Å over 394 C␣ atoms). The RNA from positions ϩ1 to ϩ9 in these three structures can be superimposed (Fig. 4A). In the structures of FBF-2 in complex with the two 13-mer RNAs, the upstream C, whether at the Ϫ1 or Ϫ2 position, is bound in a pocket between the last RNA-binding repeat 8 and the C-terminal helix, termed 8Ј (Fig. 4B). Modest changes in the RNA backbone conformations allow accommodation of a Ϫ1 or Ϫ2C. In the structure with wild-type gld-1 FBEa13 RNA with a Ϫ2C, the Ϫ1A base between the Ϫ2C and ϩ1U is not contacted by the protein.
FBF-2 makes specific contacts with the Watson-Crick edge of the upstream C through hydrogen bonds with main chain atoms of Phe-495 and Ser-554 and the side chain of Ser-554 (Fig. 4C). To explore the importance of FBF-2 Ser-554 for specific recognition of the upstream C, we mutated Ser-554 to alanine and determined the binding affinity of the mutant protein for RNAs with and without an upstream C. As a control, the mutant protein bound to the 9-mer gld-1 FBEa9 with similar affinity to wild-type protein ( Fig. 3B and supplemental Table 2, suggesting that the mutant is properly folded and residue Ser-554 is not involved in interacting with the 9-nucleotide core sequence. The S554A mutant protein bound 2-to 3-fold more weakly than wild-type protein to RNAs with Ϫ1 or Ϫ2C (Fig. 3B and supplemental Table 2), indicating that Ser-554 contributes to binding to the upstream C. The S554A mutant protein also bound ϳ2-fold more weakly to RNAs lacking either a Ϫ1 or Ϫ2C (K d 17-21 nM) than to RNAs with a Ϫ1 or Ϫ2C (K d 8 -10 nM), suggesting that even without Ser-554, the upstream C-binding pocket retains modest base selectivity. However, the selectivity is decreased compared with the 20-fold difference observed for wild-type protein with the same RNAs. Consistent with this, the S554A mutant protein bound ϳ3-fold more tightly to RNAs lacking an upstream C (K d ϳ19 nM versus ϳ50 nM for wild-type protein). On the basis of the crystal structures, we expect a purine base at the Ϫ2 position (as is present in the weaker-binding non-C containing RNA mutants) would clash sterically with the serine side chain of Ser-554 in the C-binding pocket, but not with that of an alanine. Thus S554A allows accommodation of a purine and tighter binding to the mutant RNAs.
Conservation of the Upstream C-binding Pocket-Crystal structures of yeast Puf3p in complex with COX17 RNAs have shown a similar binding pocket for a conserved C at the Ϫ2 position (11). Interaction between the upstream C and the protein is similar in FBF-2 and Puf3p (Fig. 4D). A serine residue at the beginning of the C-terminal helix (Ser-554 in FBF-2 and Ser-866 in Puf3p) makes specific hydrogen bonds with the upstream C. In addition, both FBF-2 and Puf3p utilize main chain atoms to form hydrogen bonds with the upstream C. Residue Leu-864 in Puf3p forms a stacking interaction with the Ϫ2C. Phe-552 in FBF-2 occupies the equivalent position, and its aromatic ring is positioned in a non-parallel orientation relative to the upstream C.
Using the FBF-2 and Puf3p structures, we created a structure-based amino acid sequence alignment of the C-terminal regions to identify equivalent binding pockets in other PUF proteins (Fig. 5A). Earlier sequence-based searches suggested the presence of such a binding pocket in only a limited number of other yeast family PUF proteins (11). The new structureguided sequence alignment in this region revealed that the C-binding serine is conserved in C. elegans FBF-1/2, PUF-5/6, and PUF-3/11 but not in PUF-8/9. The consensus recognition sequences for these families of PUF proteins suggest a conserved Ϫ1C in the 5BE recognized by PUF-5/6 (25) and in some sequences recognized by PUF-11 (26). In yeast Puf4p and human Pumilio 1, the C-binding serine is replaced by a histidine or tyrosine side chain, which occludes this binding pocket (11). On the basis of this sequence alignment, we predict that the bulky side chains in C. elegans PUF-8, yeast Mpt5p, and Arabidopsis PUF proteins also prevent an equivalent upstream C-binding pocket in these proteins. Thus, a small change in amino acid sequence may create a new binding pocket and change specificity.
C. elegans PUF-6 Upstream C-binding Pocket Is Restricted to Ϫ1C-To test our prediction of an upstream C-binding pocket in PUF-6, we determined the crystal structure of C. elegans PUF-6 in complex with a 13-mer RNA containing its optimal binding element, 5BE (5BE-13, 5Ј-CUCUGUAUCUUGU-3Ј). The overall structure of PUF-6 is similar to that of other PUF protein structures, with RNA bound on the concave surface of the protein (supplemental Fig. 1A). We were able to build a model for bases Ϫ1C to ϩ6C of the 5BE RNA, seven of the 13 bases in the RNA sequence. We observed only discontinuous electron density for bases ϩ7U to ϩ10U, indicating disorder in this region. The RNA structure in the PUF-6 -5BE complex is similar to that of FBF-2 in the central region (supplemental Fig.  1B). Bases 4 -6 stack with each other and turn away from the RNA-binding surface of the protein (supplemental Fig. 1C). However, residue Arg-256 in PUF-6 forms a hydrogen bond with ϩ6C, and the base at position ϩ5 is not contacted by the protein, whereas the corresponding residue Arg-364 in FBF-2 often contacts the ϩ5 base, and the base at position ϩ6 is not contacted (supplemental Fig. 1D). The orientation of RNA-interacting helices in repeats 1-4 of PUF-6 differs from those of FBF-2 (supplemental Fig. 1B), consistent with the different recognition sequences of the two proteins.
As predicted, the crystal structure of PUF-6 confirms the presence of an upstream C-binding pocket. PUF-6 shares the same sequence motif, "FSSGKK," in the C-terminal helix as FBF-2. Thus the binding pocket for the upstream C in PUF-6 is almost identical to the pocket in FBF-2 (Fig. 5B). The Watson-Crick edge of Ϫ1C forms hydrogen bonds with main chain atoms of Phe-383 and Ser-441 and the side chain of Ser-441. Phe-439 also contributes to forming the C-binding pocket.
To probe the importance of the upstream C-binding pocket in PUF-6, we mutated Ser-441 to alanine and determined the RNA-binding activity of the wild-type and mutant proteins. Wild-type PUF-6 bound to 5BE13 RNA with high affinity (K d ϭ 7.4 nM, Fig. 5C and supplemental Table 3). It bound equally well to 5BE11 RNA, which starts at the Ϫ1C (data not shown), indicating that positions Ϫ3 and Ϫ2 in the RNA sequence make little contribution to the binding. In contrast, PUF-6 bound 10-fold more weakly to 5BE10 RNA, which begins with the 5ЈUGU and lacks upstream sequences. RNAs with a Ϫ2C and/or Ϫ3C also bound more weakly than wild-type 5BE13 to PUF-6. Thus, PUF-6 recognizes only an upstream C at position Ϫ1. The S441A mutant bound ϳ3-fold more weakly than wildtype protein to 5BE13 RNA with a Ϫ1C, consistent with the importance of Ser-441 in forming the upstream C-binding pocket.
C. elegans PUF-11 Binds to an Upstream C at the Ϫ1 or Ϫ2 Position-Among C. elegans PUF proteins, PUF-11 is unusual in its flexibility to recognize three distinct classes of core consensus sequence using at least two different binding modes (26). PUF-11 binds with a K d of 0.05 nM to a model class I PUF-11 binding sequence (11BE I-1, 5Ј-UACUGUGAAUAGG-3Ј) ( Fig.  5D and supplemental Table 4). Mutation of the Ϫ1C in this sequence to A (upstream UAC to UAA) decreases affinity nearly 60-fold (K d ϭ 2.9 nM). A Ϫ2C (upstream UCA) restores binding affinity similar to that with Ϫ1C (K d ϭ 0.11 nM), but a Ϫ3C (upstream CAA) binds with an affinity similar to having no upstream C (K d ϭ 1.3 nM). Yeast 3-hybrid RNA selection experiments identified 66 unique sequences that associate with PUF-11 (26). Of these 66 sequences, 86% contain a Ϫ1 or Ϫ2C.
Mutation of the putative C-binding pocket of PUF-11 (S491A) resulted in a protein that bound 11BE I-1 RNA 64-fold weaker than wild-type PUF-11 ( Fig. 5D and supplemental Table  4). PUF-11 S491A bound to RNA with no upstream C (upstream UAC to UAA) with an affinity similar to wild-type PUF-11 for the same RNA (K d ϭ 2.9 nM), and the affinity of the mutant protein for RNA with a Ϫ2C or Ϫ3C was reduced ϳ100-fold below the affinity of wild-type PUF-11 for 11BE I-1 RNA. These data suggest that PUF-11 recognizes an upstream C at position Ϫ1 or Ϫ2 and that Ser-491 is essential in forming the upstream C-binding pocket of PUF-11.
Divergence of the Upstream C-binding Pocket during Evolution-To examine the evolution of the upstream C-binding pocket, we prepared an alignment of amino acid sequences of Puf3p homologues in 23 fungal species for which Puf3p orthologues have been identified ( Fig. 6 and supplemental Fig.  2) (27). The regions predicted to contain R8Ј helices are shown in Fig. 6, and longer regions predicted to comprise RNA-binding helices R7, R8, and R8Ј are shown in supplemental Fig. 2. Most species possess the equivalent of Ser-866 of S. cerevisiae. However, five species diverge. Three of these species lack a Puf3p with the critical serine within a predicted R8Ј helix. These include the fission yeast Schizosaccharomyces japonicus FIGURE 5. Upstream C-binding pockets in PUF proteins. A, structure-assisted sequence alignment of C-terminal sequences of selected PUF proteins. The aligned crystal structures of S. cerevisiae Puf3p, C. elegans FBF-2, human Pumilio 1, and S. cerevisiae Puf4p were used to generate a sequence alignment. Other PUF protein sequences were aligned manually. Residues in position to interact with upstream C residues are indicated in red. B, close-up view of the upstream C-binding pocket in the crystal structure of C. elegans PUF-6. A ribbon diagram of PUF-6 in complex with Ϫ1C RNA is shown. Dashed lines indicate contacts between PUF-6 and Ϫ1C. RNA is shown in a lighter shade. C, relative binding affinities of PUF-6 (wild-type and S441A mutant) for RNAs with varied upstream sequences. D, relative binding affinities of PUF-11 (wildtype and S491A mutant) for RNAs with varied upstream sequences.
with two duplicated Puf3p genes and Aspergillus nidulans and Neurospora crassa with single Puf3p genes. The two fission yeasts, Schizosaccharomyces octosoporus and Schizosaccharomyces pombe, which also possess duplicated Puf3p genes, encode one copy predicted to have an upstream C-binding pocket and a second (designated "B") lacking the critical serine. Although the sequence alignment might not detect the appropriate serine residue, a protein fold recognition threading program used to guide alignments near the R8Ј helix also did not detect a conserved serine. The simplest hypothesis is that the Puf3p protein ancestral to all five of these species possessed the pocket, which was selectively lost in a minority of descendants.

DISCUSSION
Our identification of high-confidence FBF binding sites in the 3ЈUTRs of associated RNAs has led to the discovery of an upstream C-binding pocket at the C terminus of FBF, adding an additional element of specificity for this well studied PUF protein. The FBF upstream C-binding pocket shares recognition features with a Ϫ2C binding pocket in yeast Puf3p. The upstream C-binding pocket of FBF can recognize either a Ϫ1C or Ϫ2C with only a modest change in RNA structure. Highconfidence FBEs in C. elegans mRNAs contain an upstream C at either position or both. In vitro RNA binding to RNAs without a Ϫ1C or Ϫ2C is reduced ϳ20-fold.
The importance of the upstream pocket in vivo is emphasized by C. elegans mutants obtained through unbiased genetic selections, performed long before the discovery of PUF proteins or upstream pockets (28). Normally C. elegans hermaphrodites first make sperm and then switch to making oocytes (Fig. 7, left  panel). Dominant mutants were isolated that prevented the switch to oogenesis (28), and their molecular lesions identified by sequencing (24). These point mutations lay in the 3ЈUTR of fem-3 mRNA in a region that we now know is the FBF binding element (Fig. 7, right panel). Five independently isolated alleles substituted the C at the Ϫ2 position of the binding site with a U. Others altered the G of the UGU sequence (24). This single base substitution at the Ϫ2 position prevented FBF repression of fem-3 with dramatic biological consequences: a failure to produce oocytes, and sterility.
The upstream pocket also guides different PUF proteins exclusively to their own correct mRNA targets in vivo. S. cerevisiae Puf3p binds to more than 150 mRNAs with mitochondrial functions (4). Most (88%) of these mRNAs, and others associated with Puf3p, possess a C at the Ϫ2 position (11). Mutations in the Ϫ2 position of Puf3p targets disrupt regulation by Puf3p in vivo (11). In contrast, Puf4p lacks an upstream pocket, and the mRNAs it binds in vivo lack a Ϫ2C (92%) (4, 11). Thus, the presence of a Ϫ2C in FIGURE 6. Sequence alignment of predicted R8 helices of Puf3p homologues in fungal species. Residues predicted to interact with upstream C residues are indicated in red. Dark and light gray denote 80% identical and 50% similar amino acids, respectively. The sequences of Puf3p species lacking a conserved serine residue were analyzed using structural prediction from GenThreader (35), and the residue that aligns with the C-binding serine is boxed in red. Extended alignments are presented in supplemental Fig. 2. mitochondrial mRNAs dictates that they will be regulated by Puf3p, whereas the absence of a Ϫ2C from the targets of Puf4p help exclude it (11).
Our analysis emphasizes that simple derivation of consensus motifs can miss essential features of true RNA binding sites. The consensus FBF binding site, deduced by computational analysis of FBF targets in vivo, showed only a modest preference for a C at either positions Ϫ1 and Ϫ2. Yet FBF-2 has a strong requirement for a C but at either of two positions. Simple consensus motifs combine the two classes of RNAs and so do not capture the requirement. Consensus motifs of certain DNAbinding proteins, such as Gata-4 and homeodomain protein Nkx-2.5, have similar limitations (29) and emphasize the need for other modes of analysis.
Using a sequence alignment on the basis of the crystal structures of the FBF-2 and Puf3p binding pockets, we identified additional C. elegans PUF proteins predicted to possess an upstream C-binding pocket. In vitro binding assays confirmed the importance of an upstream C for PUF-6 and PUF-11 RNA recognition, and a crystal structure of PUF-6 revealed conservation of the upstream C-binding pocket. Thus, what appeared initially to be a yeast PUF protein specialization is instead utilized more broadly.
Although the structures of the binding pockets are conserved, as is the chemical role of a key serine, the pockets of different proteins vary in two important ways. First, the preferred position of the upstream C relative to the 5ЈUGU motif in the core PUF recognition element varies. FBF-2 and PUF-11 accept a C at either Ϫ1 or 2, PUF-6 at only Ϫ1, and Puf3p at Ϫ2. Second, the quantitative effects of the upstream pocket differ substantially among the proteins. For the C. elegans PUF proteins, mutations that change the upstream C decrease in vitro affinity from 3-fold (PUF-6) to 20-fold (FBF-2) to 60-fold (PUF-11). The general features of these binding pockets are structurally conserved (main chain and serine/hydrophobic side chain interactions), but the specific contacts and conformation of the C-terminal helices of the PUF proteins may be responsible for these differences in affinity and specificity. The structure of FBF-2 is constant when bound to either Ϫ1 or Ϫ2C RNA.
The upstream pockets described here invariably recognize cytosine residues. Mutant PUF proteins that also recognize a cytosine in the core recognition region have recently been selected with the yeast three-hybrid system (30,31). However, the chemical and structural basis of cytosine recognition is completely different in those cases versus the upstream pockets described here. The mutant proteins possess alterations within the RNA recognition side chains of a typical core recognition helix, with recognition dominated by interaction with an arginine side chain (31). Although core C-recognizing helices do appear to occur in nature, they are rare (31). In contrast, upstream C pockets appear to be widespread and utilize different chemical interactions to discriminate the cytosine.
Taken together with previous studies, the specificity of each PUF protein is determined by the fusion of RNA recognition features. The central element is a characteristic core recognition sequence. The interaction of this core sequence with PUF repeats 1-8 may be highly sequence-specific (e.g. Pumilio 1) or may contain conserved motifs and binding patterns whose sequence and spacing are critical for specificity (e.g. FBF-2). The presence or absence of additional recognition features, like the upstream C-binding pockets described here, modify specificity. Other factors that contribute to regulatory activity include the binding affinity and selectivity associated with these recognition features and the localization and level of expression of the PUF protein.
Multiple chemical features of the protein-RNA interface act in concert to provide selectivity. Their combinatorial nature provides a foundation for understanding PUF control networks and their evolution. The changes required to modify RNA recognition are surprisingly simple. In the recognition helices, substitutions in one or two amino acids can expand or switch specificity. In the upstream pocket, a single mutation in the serine or cytosine alters affinity (Refs. 7, 20, 30 -34 and this paper). Variations in the affinity of a particular PUF protein for a set of target mRNAs could cause their differential regulation, whereas overlaps in specificity between two PUF proteins could allow either their interference or coregulation. In some instances, a minimal affinity threshold might be required for activity, producing a regulatory switch. C. elegans fem-3 mRNA is exemplary. A single nucleotide change in Ϫ2 position of its FBF binding site has profound consequences for the animal, including sterility. The identification of in vivo RNA targets and the analysis of the structural basis of selectivity are prerequisites for understanding how new specificities arise, disappear, and create new circuits of control. Top, the FBF-2 upstream C-binding pocket with the Ϫ2C base circled in red. Sequences shown correspond to the now-established FBF binding element in the 3ЈUTR of fem-3 mRNA, which encodes a protein critical in the switch from spermatogenesis to oogenesis. This regulatory element was first identified through genetic selections (28). Bottom, C. elegans worms at three stages of development. Worms with the wild-type C residue in the upstream site develop normally and switch from making sperm (blue) to making oocytes (pink). Worms with a Ϫ2U substitution at the Ϫ2 position develop normally but are defective in the switch and make sperm incessantly. Gray indicates undifferentiated germ line cells.