A Conserved Three-nucleotide Core Motif Defines Musashi RNA Binding Specificity*

Background: The Musashi family of RNA-binding proteins promotes progenitor cell proliferation. Results: The majority of Musashi sequence specificity comes from a UAG motif. Sequences outside of UAG make minor contributions to affinity. Conclusion: UAG forms the core Musashi recognition element. Significance: Delineating the sequences that contribute to binding is critical to understanding RNA target selection. Musashi (MSI) family proteins control cell proliferation and differentiation in many biological systems. They are overexpressed in tumors of several origins, and their expression level correlates with poor prognosis. MSI proteins control gene expression by binding RNA and regulating its translation. They contain two RNA recognition motif (RRM) domains, which recognize a defined sequence element. The relative contribution of each nucleotide to the binding affinity and specificity is unknown. We analyzed the binding specificity of three MSI family RRM domains using a quantitative fluorescence anisotropy assay. We found that the core element driving recognition is the sequence UAG. Nucleotides outside of this motif have a limited contribution to binding free energy. For mouse MSI1, recognition is determined by the first of the two RRM domains. The second RRM adds affinity but does not contribute to binding specificity. In contrast, the recognition element for Drosophila MSI is more extensive than the mouse homolog, suggesting functional divergence. The short nature of the binding determinant suggests that protein-RNA affinity alone is insufficient to drive target selection by MSI family proteins.

The life of an RNA molecule is controlled by a suite of RNAbinding proteins (RBPs) 2 that associate with specific regions of the RNA at specific times in its life. During transcription, the nascent pre-mRNA associates with components of the splicing machinery to ensure correct splicing patterns. Subsequently, the mRNA may be stabilized, destabilized, edited, or localized to a particular region of the cell, again due to specific protein associations. To carry out its task, an RBP must be able to identify and bind its precise target sequence in a heterogeneous mixture of nucleic acids. The fact that most RBP target sequences consist of short, often degenerate motifs makes the task even more complex.
The Musashi family of RNA-binding proteins is a widespread and highly conserved family that governs stem cell self-renewal and maintenance (1,2). They are overexpressed in several tumor types, and their expression correlates with poor prognosis (3)(4)(5)(6)(7). There are two mammalian MSI family members: MSI1 and MSI2. MSI1 is expressed in neural and glial progenitor cells (8 -10) as well as in progenitors of several epithelial tissues (11)(12)(13). MSI2 is 69% identical to MSI1 and is expressed in a partially overlapping set of tissues (14). MSI family members have also been identified in flies (15), worms (16), and frogs (17,18) among other species. MSI proteins contain two RRMs that provide the surface for interaction with target RNA. The first RRM has a higher affinity for RNA than the second (19,20). The C-terminal region of the protein interacts with poly(A)binding protein, inhibiting translation of bound RNAs by competing with eIF4G (21).
Several different groups have investigated the RNA binding specificity of MSI family members. Okano and co-workers (22) used the in vitro evolution method SELEX to identify optimal binding motifs for mouse MSI1 and Drosophila MSI. Mouse MSI1 bound the sequence (G/A)U 1-3 AGU. The Xenopus protein has been shown to interact with a similar sequence (17). Additionally, the NMR structure of MSI1 in complex with a five-nucleotide RNA of the sequence GUAGU has been recently solved, demonstrating how this sequence interacts with MSI1 (23). Contrasting with these observations, the coimmunoprecipitation and microarray-based approach RNAcompete identified a strong preference for the sequence UAG followed by a weaker preference for UUAG for mouse MSI1; this is substantially different from the SELEX sequence (24). A similar discrepancy exists for the Drosophila protein. SELEX experiments identified the sequence GUUU(G/AG) (25), whereas RNAcompete identified GUAG with a slight preference for an upstream A and a downstream G for Drosophila MSI (24). We wished to understand which of these motifs forms the optimal Musashi binding determinant.
A number of MSI mRNA targets have been identified, and MSI can regulate their translation positively or negatively. Most of these targets contain the MSI1 consensus sequence within their 3Ј-UTRs. In neural lineage cells, MSI1 binds the Numb 3Ј-UTR and inhibits its translation (22). By inhibiting NUMB activity, MSI1 activates the NOTCH pathway, promoting stem cell self-renewal. In contrast, MSI1 has been shown to up-regulate expression of NUMB in gastric tissue (26). MSI1 has also been shown to bind Robo3 mRNA and up-regulate protein production, controlling midline crossing of precerebellar neurons (27). In contrast to many of its other targets, MSI1 is thought to interact with a portion of the Robo3 mRNA coding region that does not contain the MSI1 consensus sequence (27). Other mRNA targets of MSI include doublecortin (28), CDKN1A (29), and c-mos (17).
To better understand the Musashi binding determinant, we set out to determine which nucleotides make the strongest thermodynamic contribution to binding of MSI to its target RNA. We used a mutational series of RNAs and analyzed the interaction between the RNAs and three MSI family members using fluorescence polarization assays. Our results demonstrate that all three MSI proteins bind a core UAG motif. In contrast to mouse MSI1 and MSI2, additional G nucleotides both upstream and downstream of UAG contribute to the interaction with Drosophila MSI. Our results reveal the core motif that is essential for binding. Additionally, they explain why different high throughput approaches have revealed seemingly different Musashi binding motifs.

EXPERIMENTAL PROCEDURES
RNAs-Synthetic RNAs labeled with fluorescein amidite at the 3Ј-end were purchased from Integrated DNA Technologies.
Plasmids-For the mouse dual RRM construct (RRM1-2), amino acids 7-192 of mouse MSI1 were amplified from a mammalian genome ORF clone (100014969; Invitrogen) using primers 5Ј-cgcgcggatcccagcccggcctcgcctcccc-3Ј and 5Ј-gcgcgaagcttcggggacatcacctcctttg-3Ј. The fragment was digested with BamHI and HindIII and cloned into a modified version of expression vector pET-22b in which the leader sequence was replaced with a His tag and a tobacco etch virus protease site. To produce the single RRM construct, the dual RRM construct was modified by QuikChange (Stratagene) mutagenesis to replace Met-104 with a stop codon. For human MSI2, amino acids 8 -198 were amplified from Mammalian Gene Collection ORF clone 3505639 using primers 5Ј-cgcgcggatccggcacctcgggcagcgccaa-3Ј and 5Ј-gcgcgaagctttcatgggaacatgacttctttcg-3Ј. The fragment was digested with BamHI and HindIII and cloned into the modified pET-22b described above. For the Drosophila construct, amino acids 162-364 were amplified from a Drosophila Genome Resource Center cDNA clone (LD31631) using primers 5Ј-gggggagctccccagcctgagcggaggc-3Ј and 5Јgggggtcgacttctacggtgtgactgcttcctt-3Ј. The fragment was digested with SacI and SalI and cloned into the His-modified PET-22b vector. For the Drosophila RRM1 construct, Gln-258 was changed to a stop codon using QuikChange mutagenesis.
Protein Purification-Bacterial expression vectors were transformed into BL21(DE3), and protein expression was induced with 1 mM isopropyl ␤-D-thiogalactopyranoside. After 3 h at 37°C, cells were pelleted and lysed in 50 mM NaH 2 PO 4 , 300 mM NaCl, 20 mM imidazole, and 5 mM ␤-mercaptoethanol with a microfluidizer (IDEX Health and Science). The soluble lysate was applied to a nickel-nitrilotriacetic acid column (Qiagen); washed with 50 mM NaH 2 PO 4 , 300 mM NaCl, 50 mM imidazole, and 5 mM ␤-mercaptoethanol; and eluted with 50 mM NaH 2 PO 4 , 300 mM NaCl, 300 mM imidazole, and 5 mM ␤-mercaptoethanol. Fractions containing recombinant protein were then purified over a HiTrap SP column (GE Healthcare) in a buffer containing 50 mM MOPS, pH 6.0, 20 mM NaCl, and 2 mM DTT. Proteins were eluted with a 0.1-1.0 M NaCl gradient in the same buffer. Drosophila RRM1 was eluted with a 0.02-2 M NaCl gradient. Fractions containing recombinant protein were then further purified over a HiTrap Q column (GE Healthcare) in a buffer containing 50 mM Tris, pH 8.8, 20 mM NaCl, and 2 mM DTT, eluting with a 0.1-1.0 M NaCl gradient. The Drosophila proteins were recovered from the flow-through. Recombinant proteins were stored in 50 mM Tris, pH 8.0, 20 mM NaCl, and 2 mM DTT at 4°C.
Fluorescence Polarization-Fluorescence polarization (FP) assays were conducted as described elsewhere in detail except that the carrier bovine ␥-globulin (Invitrogen) was added to the buffer (30). Briefly, a dilution series of recombinant protein was prepared in a buffer containing 50 mM Tris, pH 8.0, 100 mM NaCl, 0.01% IGEPAL CA-630, 0.01 mg/ml tRNA, and 0.1 mg/ml bovine ␥-globulin. RNA was diluted in the same buffer to achieve a final concentration of 2 nM in the binding reaction. RNA was added to the protein dilutions and allowed to equilibrate for 3 h at room temperature. Polarization was detected with a Victor3 multilabel counter (PerkinElmer Life Sciences). For the dual RRM proteins, the maximal protein concentration in the dilution series was 2 M. For the single RRM protein, the maximal concentration was increased to 4 M.
Data Analysis-FP data were analyzed as described previously (30). Briefly, polarization (millipolarization) was plotted against the protein concentration (M). The data were then fit to the Hill equation to obtain the K d,app . Each RNA was assayed in triplicate. In one case, where an accurate fit could not be achieved due to low affinity, the Hill coefficient was constrained to 1.0 during curve fitting. K rel was determined from Equation 1.
where K d,app2 is the mutant value and K d,app1 is the K d,app of the control RNA. The change in free energy change (⌬⌬G) due to the mutation was calculated according to Equation 2.
where T is the temperature and R is the gas constant. NMR-Labeling with 15 N was performed by growing cells in isotopically enriched M9 medium with 1 g of 15 NH 4 Cl/liter. Two-dimensional 1 H-15 N heteronuclear single quantum coherence spectra were collected using samples of U-15 N-labeled MSI1 in a 90% H 2 O and 10% D 2 O buffer solution of 50 mM Tris, pH 7.0. A series of 1 H-15 N heteronuclear single quantum coherence spectra of 15 N-labeled MSI1 were collected at different concentrations of unlabeled RNA (either 5Ј-UUUAU-UUUAGUUUUU-3Ј or 5Ј-UUUAUAGUUUUU-3Ј). The titrations were terminated when changes of amide group chemical shift positions were no longer observed. All experiments were performed at 600 MHz on a Varian Inova spectrometer equipped with a triple resonance cold probe at 298 K. Data processing was performed using NMRPipe (31) and SPARKY (32) software.
UAG Frequency Determination-The UTRs of RNAs determined to be associated with MSI by Vo et al. (3) were downloaded from the hg38 assembly of the human genome via the University of California at Santa Cruz genome browser. For the random background sets, all UTRs were retrieved from the University of California at Santa Cruz genome browser, and 50 UTRs were randomly selected using a random number generator. All sequence processing was conducted using custom Perl scripts.
For the trimer analysis, the number of occurrences of each three-nucleotide motif was determined, and the frequency of each motif in the set of UTRs was calculated. The MSI-associated set and the three random sets were processed separately. The frequency change was determined by subtracting the mean of the frequency in the random sets from the frequency in the MSI-associated set. Data were ordered for ease of visualization before plotting. Z-scores were calculated as described in Yeo et al. (33) using the random sets for expected values.
For the cluster analysis, data were analyzed in 50-nucleotide overlapping windows. Windows overlapped by one nucleotide. For each window, the number of UAGs was counted. The fraction of windows at each UAG content level was calculated. Each data set was analyzed independently. Statistical significance was determined using a t test, comparing the fraction of windows in the MSI-associated set with each of the three random sets. Statistical tests and population distribution calculations were conducted using IGOR Pro (Wavemetrics).
Homology Modeling-The RNA-bound structure of MSI1 RRM1 (Protein Data Bank code 2RS2) was downloaded from the Protein Data Bank. The composite file was split into separate entries, and the structures were visualized using PyMOL (Schrödinger, LLC). The homology model of Drosophila RRM1 was created using SWISS-MODEL (34).

RESULTS
Binding of Mouse MSI1 RRM1-2 to Mutant RNAs-Based upon the binding aptamer identified by Okano and co-workers (22) using SELEX, we inserted the sequence AUAGU into a background of U to produce a 12-nucleotide RNA fragment of the sequence 5Ј-UUUAUAGUUUUU-3Ј. This was the control RNA sequence that served as the starting point for the mutational series. We analyzed the interaction between the control RNA and mouse MSI RRM1-2 with fluorescence polarization. The K d,app of the interaction is 21 Ϯ 1 nM (Fig. 1A).
To analyze the contribution of each nucleotide position to the interaction, we designed a mutational series of RNAs in which each position is individually mutated to each of the three remaining nucleotides, generating 36 mutant variants in total. The most significant changes upon binding were due to mutation of U5, A6, or G7. Mutation of U5 to A or C and mutation of A6 and G7 to any of the other nucleotides caused a change in the standard ⌬⌬G of more than 1 kcal/mol with a K rel of greater than 10 ( Fig. 1B and Table 1). Mutation of position U3 to a C had a less severe effect on the interaction (⌬⌬G ϭ 0.67 Ϯ 0.05 kcal/mol). All other mutations had little to no effect on binding free energy. The mutational analysis shows that binding is energetically driven by the sequence UAG at positions 5 through 7 with a slight additional contribution from the absence of a C at position 3.
We noticed that mutation of U5 to G introduced a new UAG motif within the RNA. To determine whether the lack of an effect of this mutation is due to the presence of the new UAG, we analyzed an RNA of the sequence 5Ј-UUGAGAGUUUUU-3Ј. This RNA contains a G at position 3 in addition to position 5. The mutations at position 3 or 5 alone had little to no effect on binding (Fig. 1B). However, the double mutation had a more severe effect with a ⌬⌬G of 1.35 Ϯ 0.04 kcal/mol. Together, the data indicate that UAG provides the critical associations between MSI RRM1-2.
Binding of Mouse MSI1 RRM1-2 Tolerates Spacing Flexibility-SELEX experiments performed by Okano and co-workers (22) suggest that one to three uridines are tolerated in the MSI1 binding motif between positions 3 and 5. To determine whether an increase in the number of uridines alters the free energy of binding, we generated a series of spacing mutants in which the position of the U is expanded to accommodate up to five uridines. We found that the interaction of MSI RRM1-2 was insensitive to insertions in this position (Fig.  1C). Moreover, mutation of U to C in position 3 caused a decrease in the affinity of the interaction regardless of the presence of the insertion (Fig. 1D). The data confirm that insertion of uridines upstream of the UAG is tolerated and show that as many as five (if not more) uridines can be accommodated with no impact on binding. The data are consistent with a model in which a flexible stretch of bases may loop out of the recognition site.
Binding Specificity Is Determined by RRM1-NMR experiments conducted by Katahira and co-workers (23) have suggested that many of the amino acids that contact RNA in RRM2 are similar to those making contacts in RRM1. We were interested in determining whether the presence of RRM2 alters the specificity of the MSI RRM1 protein-RNA interaction. We therefore analyzed the mutation series to determine whether the specificity of RRM1 is the same. The single RRM protein interacts with the control RNA of sequence 5Ј-UUUAUAGU-UUUU-3Ј with a K d,app of 68 Ϯ 3 nM (Fig. 2A). The affinity is 3-fold weaker than that of the dual RRM construct (Fig. 1A). Analysis of the mutant series revealed that, just as for the two RRM construct, the nucleotides that have the strongest contribution to the affinity of the interaction are the central UAG ( Fig. 2B and Table 2). Therefore, the second RRM domain contributes to the affinity of the interaction but does not alter its specificity.
MSI1 RRM1-2 has a similar affinity for RNAs with uridine insertions upstream of the UAG motif (Fig. 1C). We wished to determine whether RNAs with and without uridine insertions are recognized in the same way. Analysis of the 1 H-15 N heteronuclear single quantum coherence spectra of MSI1 RRM1 bound to two different RNA sequences (5Ј-UUUAUAGUU-UUU-3Ј and 5Ј-UUUAUUUUAGUUUUU-3Ј) shows that the signals of the amide groups of MSI1 in the two complexes mostly overlap (Fig. 2C). The average chemical shift difference between corresponding residues in the two complexes is 0.033 Ϯ 0.036 ppm. In addition, the residues that show the largest chemical shift changes agree with those reported for binding with the RNA sequence 5Ј-AGCGUUAGUUAUUUA-GUUCG-3Ј (35). Because the chemical shift of a nucleus is very sensitive to its chemical environment, this result indicates that the mode of RNA recognition is the same in the three RNAbound complexes and that the RNA-bound structure of MSI1 is very similar when bound to either 5Ј-UUUAUAGUUUUU-3Ј or 5Ј-UUUAUUUUAGUUUUU-3Ј. We first asked whether the motif UAG is more frequent in MSI target mRNAs than expected. For a background set, we produced three different sets of 50 randomly selected UTRs from the human genome. We counted the number of occurrences of each trinucleotide motif and determined its frequency in each set. We then examined the difference in the frequency of each motif in the MSI-associated set compared with the background sets. We found that UAG with a frequency change of 3.6 ϫ 10 Ϫ3 is one of the top enriched motifs in the MSIassociated set (Fig. 3A). The top seven enriched motifs each include only combinations of U and A, possibly pointing to an unstructured quality of the MSI-associated UTRs (Fig. 3A). The only remaining trinucleotide with an enrichment greater than that of UAG in the set is CUA, which has a frequency change of 3.7 ϫ 10 Ϫ3 . The binding data presented here do not explain why CUA trinucleotides are enriched in mRNAs that co-immunoprecipitate with MSI1.
Because RNAs with multiple MSI binding sites may be more likely to associate with MSI in vivo, we next examined whether there are clusters of UAG in the MSI-associated set. Again we

Three-nucleotide Core Musashi Binding Motif
compared the data for the MSI-associated RNAs with data obtained from the same three randomly selected background UTR sets used in Fig. 3A. Each UTR was divided into 50-nucleotide overlapping tiles, and the number of UAG motifs within each tile was counted. For each number of motifs, the fraction of tiles with that number was determined. We found a significant decrease in the number of tiles with no UAG motifs in the MSI-associated set (Fig. 3, B and C), whereas the number of tiles with a single UAG was unchanged. We found a significant increase in the number of tiles with two UAG motifs in the MSI-associated set (Fig. 3, B and C). No other numbers of motifs were significantly changed in comparison with all three random sets.
To compare the enrichment of UAG in the MSI-associated UTR set with the enrichment of another RBP that has a larger binding site with a well defined specificity, FOX2, we calculated the Z-score. The Z-score for UAG was 2.79, indicating that UAG enrichment is unlikely to occur by chance. In comparison, the reported Z-score for the target sequence of FOX2 (UGCAUG) is 25.16 (33), indicating a greater significance of FOX2 binding site enrichment. It is important to note, however, that differences between the two data sets (3Ј-UTRs of MSI-associated RNAs versus cross-link immunoprecipitation sequence clusters) make direct comparison difficult.
In summary, we found a decrease in the number of 50-nucleotide windows with no UAG motifs and an increase in the number of 50-nucleotide windows with two UAG motifs in the MSI1-associated set. Additionally, we found that UAG was one of the most enriched trimers in the MSI-associated data set. These observations support our finding that UAG is the core motif driving association.
Human MSI2 Recognizes the Same Sequence as Mouse MSI1-The high degree of conservation within the MSI protein family suggests that their recognition sites may be the same. To test this with another family member, we assessed the binding of human MSI2 RRM1-2 to the same library of mutant RNAs used for mouse MSI1. The two RNA binding domains are 87% identical (Fig. 4). Human MSI2 bound the control sequence 5Ј-UUUAUAGUUUUU-3Ј with a K d,app of 28 Ϯ 4 nM. Mutational analysis revealed that, as for MSI1, the largest contributions to binding energy come from the UAG element ( Fig. 5 and Table 3).
Drosophila MSI Recognizes a Broader Region of RNA than Mouse MSI1-MSI family members are highly conserved among many different animal phyla. One of the most divergent members is the namesake of the family, Drosophila MSI. Drosophila has another MSI family member, known as RBP6, that is more similar to the mammalian proteins (36). We wished to determine whether Drosophila MSI has the same preference for UAG as mouse MSI1. Previous SELEX experiments by Okano and co-workers (25) have identified a motif consisting of GU 3-6 (G/AG), suggesting that the motif identified by this protein is different from the mouse MSI1 motif.
A sequence alignment of RRM1 from Drosophila and mouse reveals that many of the amino acids that are involved in RNA recognition are conserved between the two proteins, suggesting that the mode of RNA recognition may be the same (Fig. 6A). To assess the possibility that binding requirements may differ for Drosophila MSI, we prepared a homology model of Drosophila MSI RRM1 based upon the recently published structure of mouse RRM1 bound to RNA of the sequence GUAGU (23). In the homology model, many features of the association, including both hydrophobic and electrostatic interactions, are preserved (Fig. 6B). The model therefore predicts that the sequence specificity of the two proteins should be similar.
We therefore analyzed the affinity of Drosophila RRM1-2 for the control RNA used in the mouse mutational analysis. Drosophila RRM1-2 recognized this RNA with a K d,app of 172 Ϯ 1 nM (Fig. 7A) in contrast to mouse RRM1-2, which interacted with a K d,app of 21 Ϯ 1 nM. Because the published consensus sequence of Drosophila MSI is the sequence GU 3-6 (G/AG) (25), we also analyzed the mutant that contained a G at position 11, 5Ј-UUUAUAGUUUGU-3Ј. This RNA interacted with Drosophila MSI RRM1-2 with a K d,app of 53 Ϯ 1 nM (Fig. 7A). The same mutation had no effect on the affinity of mouse RRM1-2 (Fig. 1A). Sequence requirements of the two proteins are therefore not identical.
To determine which nucleotides within the RNA make the strongest contributions to binding affinity with Drosophila RRM1-2, we designed a mutation series of RNAs based upon the control sequence 5Ј-UUUAUAGUUUGU-3Ј. As we would predict from the homology model, the UAG motif contributes binding energy to the interaction. Mutation of G7 to any other nucleotide caused a change in the free energy of association of greater than 0.5 kcal/mol (Fig. 7B and Table 4). Likewise, muta-tion of U5 to A or C caused a ⌬⌬G of more than 0.5 kcal/mol ( Fig. 7B and Table 4). It is likely that mutation of this position to a G has no apparent effect as it creates a second UAG, similar to the mouse protein. Mutation of A6 to C caused a ⌬⌬G of more than 0.5 kcal/mol, whereas mutation to G or U caused a change of ϳ0.5 kcal/mol (Fig. 7B and Table 4).  Uppercase letters correspond to amino acids that are conserved in at least two of the three proteins. Lowercase letters represent amino acids that are not conserved.
We found several differences in the sequence specificity of the Drosophila protein. The first is a strong preference for a G in position 4 (Fig. 7B). Although the NMR structure of the mouse protein was solved with an RNA that contained a G in this position (23), our mutational analysis showed an equal preference for an A at this site (Fig. 1B). Given that the amino acids predicted to contact this nucleotide are conserved between the two proteins, it is unclear why the mouse protein would have equal preference for A and G whereas the Drosophila protein strongly prefers a G. The structure of mouse MSI1 bound to RNA shows that G1 is recognized via hydrogen bonds to the N7 and 2Ј-OH and by stacking with tryptophan 29. These contacts should equally accommodate an A or G as we observe for the mouse protein but not the fly protein. It is likely that additional contacts enforce the G preference in the Drosophila variant. The second is that we observed that mutations at positions 9 and 11 also produced changes of ϳ0.5 kcal/mol (Fig. 7B). Because the homology model does not predict an extended interface, there must be aspects of the interaction between the Drosophila protein and the RNA that are not predicted by the mouse interaction.
The observation that Drosophila MSI interacts with a longer nucleotide sequence than the mouse protein suggests the possibility that RRM2 contributes to the specificity of the Drosophila protein unlike its mouse homologs. To test this idea, we prepared Drosophila MSI RRM1 and analyzed its interaction with the same series of RNAs analyzed for Drosophila RRM1-2. The affinity of RRM1 for the starting sequence, 5Ј-UUUAUA-GUUUGU-3Ј, was 558 Ϯ 23 nM, which is 10-fold weaker than that of RRM1-2 (53 Ϯ 1 nM) ( Table 5), suggesting that RRM2 contributes to the affinity of MSI for RNA as it does for mouse. We observed similar sequence requirements for RRM1 at positions 4 and 11 as for RRM1-2, suggesting that RRM1 defines the specificity of the interaction and that Drosophila RRM1 makes different contacts with RNA than mouse   Fig. 8A and Table 5). Despite the similarities in nucleotide requirements, the magnitude of the change in affinity upon mutation was not as large as that of RRM1-2 (Figs. 7B and 8B). The reduction is likely due to the inherent difficulties of measuring sequence-specific binding of weaker interactions, such as nonspecific binding at high protein concentration and underestimation of the K d of the weakest binders during curve fitting.

DISCUSSION
We have delineated the contribution of each nucleotide to binding affinity for three MSI family members, mouse MSI1, human MSI2, and Drosophila MSI. For mouse MSI1 and human MSI2, the three-nucleotide core motif UAG is responsible for the majority of the binding affinity. There are two major differences between binding of the Drosophila and the mammalian proteins with RNA. First, Drosophila MSI requires a more extended motif, including the presence of a downstream guanosine. This guanosine appears to be specified by RRM1. Second, mouse MSI1 is able to recognize the sequences GUAGU and AUAGU with similar affinities, whereas Drosophila MSI has a higher affinity for the sequence GUAGU. The reason for these differences is unclear given that the majority of the amino acids contacting RNA in the mouse structure are conserved with the Drosophila protein. However, visual inspection of the sequence of RRM1 in the two species reveals a lysine at position 244 that is a threonine in the mouse (Fig. 5). This amino acid is adjacent to Lys-243, which is predicted to make contacts with G1. Lys-244 is therefore a candidate factor that may contribute to guanosine specification by Drosophila MSI RRM1. Additionally, the RNAcompete Drosophila binding site indicates a weaker preference for an A upstream of the G, further suggesting that there  may be aspects of the interaction between Drosophila MSI and RNA that differ from the mouse interaction.
We have also determined that insertion of a stretch of up to four additional uridines immediately upstream of the UAG but downstream of the G or A has no effect on binding affinity. Although the identity of the nucleotide at position 4 has a relatively small impact on affinity, the presence of a cytidine at position 3 reduced affinity by ϳ0.5 kcal/mol even when up to three uridines were present. The effect was reduced in mutants that contained more than three uridines. Identification of cross-link immunoprecipitation tags for MSI would reveal whether MSI-bound RNAs in cells contain insertions or other sequence variations within this region.
The results of our analyses provide an explanation for the differences between MSI consensus sequences determined by different methods. For the mouse protein, our data provide support for the motif identified by SELEX, (G/A)U 1-3 AGU (22). Additionally, our data are consistent with the recently published NMR structure of MSI bound to GUAGU (23). Finally, our observation that UAG is the critical binding element provides an explanation as to why the sequence UAGUUAG was identified by RNAcompete with a strong pref-erence for the first UAG and a weak preference for the following four nucleotides (24). The second UAG is likely to be favored due to avidity effects caused by the presence of a second interaction site. For the Drosophila protein, our data expand upon the motif identified by SELEX, GUUU(G/AG) (25), as it reveals the need for a UAG and the similarity between the mouse and Drosophila interactions. Finally, our data expand upon the consensus sequences identified by SELEX by revealing the relative contribution of each nucleotide to the interaction affinity. Such information is critical in the identification of suboptimal binding sites, which may be more accessible than optimal sites for interaction with competing regulatory factors within a cell.
The preference of the Musashi family proteins appears to be unique among RNA-binding proteins. According to the resource provided by RNAcompete, the RNA-binding proteins with the most closely related binding determinants specifying UAG are pufferfish heterogeneous nuclear ribonucleoprotein AB and human DAZAP1 (24). Each of these contains a UAG within the context of a larger motif. Several other proteins also contain UAGs within larger motifs, including heterogeneous nuclear ribonucleoproteins A1 and A2B1 (24,37). More work is needed to assess the relative contribution of individual nucleotides within each motif.  Recently published RNA immunoprecipitation experiments using human MSI1 have identified recognition sites for the two RRMs within targets, that is the presence of a GUAG for RRM1 and a UAG for RRM2 (3). However, NMR experiments have shown that RRM2 has limited affinity for RNA due to its surface electrostatic potential and its relative rigidity (20). Our experiments demonstrate that for 12-nucleotide RNAs the presence of RRM2 increases the affinity of interaction without changing the nucleotide requirements. In other words, the central UAG is the major binding determinant in the presence and absence of RRM2. Despite this fact, when we analyzed the number of UAG motifs in RNAs that have been shown previously to associate with MSI (3), we were able to detect an increase in the frequency of 50-nucleotide windows that contain two UAG motifs. There are two possible explanations for this observation. First, recovery after MSI1 immunoprecipitation is more probable for RNAs that contain more UAG motifs and may be bound to multiple MSI1 proteins due to avidity. Second, multiple UAG motifs could be required for in vivo association in a way not described by our study.
RRMs are highly versatile RNA-binding platforms. They have been shown to interact with nucleotide sequences that vary in length from two nucleotides for the second RRM of nucleolin and TDP-43 (38,39) and three nucleotides for deleted in azoospermia-like (DAZL) (40) to eight nucleotides for the first RRM of U2BЉ (41). The lengths of the binding determinants for both mouse and Drosophila Musashi proteins are within this range. Many RRM-containing proteins have two or more tandem RRMs. Recognition of RNA by both RRMs would be expected to increase affinity and specificity. For TDP-43, this has been shown to be the case (39). However, for both mouse and Drosophila Musashi proteins, the presence of the second domain does not affect the specificity of the interaction.
Understanding the forces that drive RBPs to select specific RNA targets from a heterogeneous pool of nucleic acids is central to understanding how they control gene expression. The relative importance of the three-nucleotide UAG MSI motif is seemingly at odds with its relative frequency in the pool of cellular RNAs. For MSI and other proteins with short recognition elements, it is likely that other factors in addition to binding site recognition contribute to the process of target selection. Understanding the nature of these factors and their contribution to target selection will be an exciting area of research in the future.