Structure of a PLS-class Pentatricopeptide Repeat Protein Provides Insights into Mechanism of RNA Recognition

Background: Pentatricopeptide repeat (PPR) proteins are sequence-specific RNA-binding proteins involved in organelle RNA processing. Results: We identified RNA-binding sites of a small PPR protein (THA8L) from Arabidopsis thaliana and solved its crystal structure. Conclusion: THA8L-RNA binding is dependent on a combination of specific nucleotide base interactions and nonspecific backbone interactions. Significance: This work advances our understanding of the mechanism of PPR protein-RNA interaction. Pentatricopeptide repeat (PPR) proteins are sequence-specific RNA-binding proteins that form a pervasive family of proteins conserved in yeast, plants, and humans. The plant PPR proteins are grouped mainly into the P and PLS classes. Here, we report the crystal structure of a PLS-class PPR protein from Arabidopsis thaliana called THA8L (THA8-like) at 2.0 Å. THA8L resembles THA8 (thylakoid assembly 8), a protein that is required for the splicing of specific group II introns of genes involved in biogenesis of chloroplast thylakoid membranes. The THA8L structure contains three P-type PPR motifs flanked by one L-type motif and one S-type motif. We identified several putative THA8L-binding sites, enriched with purine sequences, in the group II introns. Importantly, THA8L has strong binding preference for single-stranded RNA over single-stranded DNA or double-stranded RNA. Structural analysis revealed that THA8L contains two extensive patches of positively charged residues next to the residues that are proposed to comprise the RNA-binding codes. Mutations in these two positively charged patches greatly reduced THA8L RNA-binding activity. On the basis of these data, we constructed a model of THA8L-RNA binding that is dependent on two forces: one is the interaction between nucleotide bases and specific amino acids in the PPR motifs (codes), and the other is the interaction between the negatively charged RNA backbone and positively charged residues of PPR motifs. Together, these results further our understanding of the mechanism of PPR protein-RNA interactions.

The pentatricopeptide repeat (PPR) 4 proteins form an exceptionally large family of conserved RNA-binding proteins found in yeast, plants, and humans. In higher plants, PPR proteins are involved primarily in mitochondrial and chloroplast RNA processing, with Ͼ400 members in Arabidopsis thaliana (1,2). PPR proteins are characterized by degenerate 35-amino acid repeats arranged in tandem, are localized predominantly to chloroplasts and mitochondria, and take part in virtually all processes that affect RNA metabolism in these organelles (3)(4)(5). Plant chloroplasts contain a complex genetic framework to realize the genetic information encoded in their DNA. Chloroplasts in land plants have retained Ն100 genes, most of which encode components of the basal chloroplast gene expression machinery or subunits of photosynthetic complexes. Post-transcriptional aspects of gene expression in land plant chloroplasts are particularly complex, including RNA editing, segmental mRNA stabilization, and the splicing of Ն20 group II introns (6,7). The expression of the small chloroplast genome requires hundreds of nuclear gene products; among them are PPR proteins, which play a crucial role in recognizing and regulating RNA processing through their ability to bind target RNA in a sequencespecific manner (8 -10).
PPR proteins can be separated into two main classes, denoted P and PLS, depending on the subtypes of PPR motifs. P-class PPR proteins contain tandem arrays of 35-amino acid PPR motifs, typically ended with a proline (P). Members of this class have been implicated in RNA stabilization, processing, splicing, and translation. PLS-class proteins contain alternating canonical P-type motifs and variant "long" (L)-and short (S)-type motifs and function mainly in RNA editing (3). PPR proteins bind RNA through contribution of individual PPR motifs, which are predicted to adopt a two-␣-helix structure (11,12). A combinatorial amino acid code for RNA recognition by PPR proteins has been recently proposed for P-and S-type motifs but not for L-type motifs, which are proposed to function as linkers that connect P-and S-type motifs in PPR proteins (1,13). This putative code of RNA recognition is consistent with data on engineered PPR proteins and has greatly enhanced our understanding of PPR protein-RNA interactions (2,5).
Structural studies of PPR-containing proteins have revealed the overall arrangement of PPR motifs and have shed light on the mechanism of RNA binding by PPR motifs. The crystal structure of PRORP1 (proteinaceous RNase P1) from A. thaliana reveals a prototypical metallonuclease domain tethered to a PPR domain by a zinc-binding domain, where the PPR domain enhances catalytic activity through interaction with pre-tRNA (14). The structure of human IFIT5 in complex with a 5Ј-triphosphate RNA reveals that PPR motifs interact with the triphosphate groups of nucleotides but make no base contacts (15). Thus, the structural mechanism of sequence-specific RNA binding by PPR proteins is not understood, and it is anticipated that structures of PPR proteins will contribute to our understanding of PPR protein-RNA recognition.
THA8 (thylakoid assembly 8) is a maize gene originally identified from a screen for nuclear mutations that cause defects in the biogenesis of chloroplast thylakoid membranes. The THA8 gene encodes a small protein that is localized to chloroplasts, where it is required for the splicing of the ycf3-2 and trnA group II introns, which contain multiple THA8-binding sites (7). Analysis of the THA8 ortholog in A. thaliana showed that their molecular functions are conserved. Null mutations in THA8 are embryo-lethal in Arabidopsis and seedling-lethal in maize. Whereas most PPR proteins have Ͼ10 PPR motifs, THA8 belongs to a subfamily of plant PPR proteins with a predicted four-and-a-half PPR motifs and have the potential to mediate specific RNA binding in vivo despite its small size. THA8L (THA8-like) protein from A. thaliana is a small PPR protein that resembles THA8 and is predicted to comprise four PPR motifs, but its function is unknown. In this study, we solved the THA8L crystal structure, identified its RNA-binding site, and performed mutational and biochemical analyses of THA8L-RNA interactions. These results provide important insights into the function of THA8L in specific RNA binding and processing.

EXPERIMENTAL PROCEDURES
Protein Expression and Purification-THA8 (residues 34 -263), AtTHA8 (residues 34 -222), or THA8L (residues 60 -257) was expressed as a His 6 -small ubiquitin-like modifier (SUMO) fusion protein from the expression vector pET24a (Novagen). The modified fusion protein contains a His 6 tag (MKKGHHH-HHHG) at the N terminus and a ULP1 protease site between the SUMO and PPR proteins. BL21(DE3) cells transformed with the expression plasmid were grown in LB broth at 16°C to A 600 ϳ 1.0 and induced with 0.1 mM isopropyl 1-thio-␤-Dgalactopyranoside for 16 h. Cells were harvested, resuspended in 35 ml of extract buffer (20 mM Tris (pH 8.0), 200 mM NaCl, and 10% glycerol) per 2 liter of cells, and passed three times through a French press with pressure set at 1000 pacals. The lysate was centrifuged at 16,000 rpm (30,800 ϫ g) in a Sorvall RC12BP rotor for 30 min, and the supernatant was loaded on a 15-ml nickel HP column. The column was washed with 10% buffer A (20 mM Tris (pH 8.0), 200 mM NaCl, 500 mM imidazole, and 10% glycerol) for 150 ml and eluted in two steps with 50% buffer A for 100 ml and then 100% buffer A for 50 ml. The eluted His 6 -SUMO-AtTHA8, His 6 -SUMO-THA8L and His 6 -SUMO-THA8 proteins were dialyzed against extract buffer and cleaved overnight with ULP1 at a protease/protein ratio of 1:500 in the cold room. The cleaved His 6 -SUMO tag was removed by passing through a nickel HP column, and the protein was further purified by chromatography on a HiLoad 26/60 Superdex 200 gel filtration column in 25 mM Tris (pH 8.0), 200 mM ammonium acetate, 1 mM dithiothreitol, and 1 mM EDTA. Apo-AtTHA8, apo-THA8L, and apo-THA8 eluted as sharp single peaks with an estimated molecular mass of 23 kDa, suggesting that apo-AtTHA8, apo-THA8L, and apo-THA8 are monomers in solution.
To prepare selenomethionine (SeMet)-substituted THA8L protein, the pET24a-His 6 -SUMO-THA8L expression plasmid was transformed into B834 methionine auxotroph cells. A single colony was inoculated into 2 liters of LB broth with antibiotics ϩ 1% glucose and shaken overnight at 30°C. Crystallization-Purified THA8L protein was concentrated to ϳ10 -15 mg/ml prior to crystallization trials. Initial screening identified that PEG 3350 (Polyethylene Glycol 3350) is favorable for crystal formation. Optimization trays using PEG were set up manually using the hanging drop method at 20°C. Rod-shaped crystals with a length of ϳ200 m for the longest dimension were obtained using 1 l of the purified protein and 1 l of well solution (12% (w/v) PEG 3350 and 4% (v/v) Tacsimate (pH 4.0)). These crystals diffracted x-rays to ϳ2.0 Å at the LS-CAT beamline of the Advanced Photon Source synchrotron. SeMet-substituted THA8L protein was crystallized using 1 l of the purified protein and 1 l of well solution (13% (w/v) PEG 3350, 2% (v/v) Tacsimate (pH 4.0), and 0.1 M sodium acetate (pH 4.6)). Rod-shaped crystals were obtained with a length of ϳ100 m for the longest dimension.
Data Collection and Structure Determination-All crystals were transferred to well solution with 22% (v/v) ethylene glycol as cryoprotectant before flash-freezing in liquid nitrogen. Data collection was performed at beamline 21-ID-D (LS-CAT) of the Advanced Photon Source synchrotron. A native data set was collected to 2.0 Å. Initial structure determination by molecular replacement using the PPR domain from the crystal structure of PRORP1 (Protein Data Bank code 4G23; which shares ϳ20% sequence identity with THA8L) as a search model failed to yield any correct solution. To solve the phase problem, one data set of the SeMet-substituted THA8L crystal was collected at a wavelength of 0.9793 Å (peak wavelength) using an inversedbeam strategy to measure the selenium anomalous signal. The data were processed using XDS (16), combined using Pointless, and scaled using Scala of the CCP4 suite (17). Initial phases were established using the SHELX program (18) with the single selenium anomalous data set by single-wavelength anomalous diffraction phasing (Table 1). A total of 17 selenium sites were identified by SHELXD with a CC all /CC weak score of 61.0/38.0, and subsequent phasing was performed using SHELXE. Density modification for the initial electron density map was performed using DM (19). A crude model was built automatically using the CCP4 program buccaneer with R/R free of 0.341/0.396. The model was further improved by several cycles of manual building using Coot (20) and refinements with the refmac program of CCP4 (21) to an R factor of 0.21 and an R free factor of 0.23. The SeMet-substituted THA8L crystal has a space group of P2 1 with two molecules/asymmetric unit. One of the two models from the SeMet crystal was used to refine against the native data set, which was crystallized in the C2 space group with one molecule/asymmetric unit. The final structure model for THA8L was refined to an R factor of 0.20 and an R free factor of 0.25 (Table 1).
Assays for Interactions between THA8L and Nucleic Acids-Interactions between AtTHA8, THA8L, or THA8 and biotin-Zea 1a RNA were assessed by luminescence-based AlphaScreen technology (PerkinElmer Life Sciences), which our group has used extensively to determine protein-protein or protein-small molecule interaction (22,23). Biotin-RNAs were attached to streptavidin-coated donor beads, and His 6 -tagged THA8 and THA8L proteins were attached to nickel-chelated acceptor beads. The donor and acceptor beads were brought into proximity by the interactions between PPR proteins and biotin-RNAs, which were measured with different concentrations of biotin-RNAs or after addition of Zea 4 ssRNA, dsRNA, RNA/ DNA duplex, ssDNA, and dsDNA to the reaction system. When excited by a laser beam of 680 nm, the donor beam emits singlet oxygen that activates thioxene derivatives in the acceptor beads, which releases photons of 520 -620 nm as the binding signal (see Fig. 2B). The experiments were conducted with 20 nM THA8L and biotin-Zea 1a RNA in the presence of 5 g/ml donor and acceptor beads in 50 mM MOPS (pH 7.4), 50 mM NaF, 50 mM CHAPS, and 0.1 mg/ml bovine serum albumin. The results were based on an average of three experiments with standard errors typically Ͻ10% of the measurements. The IC 50 values were derived from curve fitting based on a competitive inhibitor model for the binding of His 6 -THA8L and biotin-RNA using GraphPad Prism.
Gel Mobility Shift Assay-A gel mobility shift assay was performed to detect RNA binding by THA8L protein. Both Zea 4 and control HB9 RNA oligoribonucleotides were 5Ј-end-labeled with [␥-32 P]ATP using T4 polynucleotide kinase according to the manufacturer's protocol (Invitrogen). The labeled RNA probe was separated from unincorporated nucleotides by centrifugation using a Sephadex G-25 quick-spin column (Roche Applied Science). The binding reaction contained 25 mM Tris (pH 8.0), 0.125 mM EDTA, 10% glycerol, 25 mM KCl, 1 ng of labeled RNA oligonucleotides, and various amounts of THA8L protein as indicated. Binding reactions were incubated for 30 min at room temperature and resolved on 6% native polyacrylamide gel running in 0.5 ϫ Tris borate/ EDTA buffer at 100 V. Results were visualized on a Phosphor-Imager (Fujifilm).
Analytic HPLC Gel Filtration-Size exclusion chromatography was performed using a Nanofilm SEC-500 column (Sepax Technology) in an SPD-20AV system (Shimadzu). The column was run on a Waters HPLC system at a flow rate of 0.35 ml/min with dual mode detection at 280 and 220 nm. The column was equilibrated with running buffer (20 mM Tris (pH 7.5), 50 mM NaCl, and 2% glycerol) to obtain a stable base line. After that, 50 l of ϳ1.0 mg/ml wild-type or mutant THA8L protein solutions were centrifuged, and the supernatant was loaded onto the column to detect the association state and the elution time of the main peak of THA8L solutions.
Mutagenesis-Site-directed mutagenesis was carried out using the QuikChange method (Stratagene) or the GeneTailor system (Invitrogen). Mutations and all plasmid constructs were confirmed by sequencing before protein expression. Expression and purification of His-SUMO-THA8L mutant proteins were performed as described above.

RESULTS AND DISCUSSION
Structure Determination of an AtTHA8-like PPR Protein-Because THA8 is the first small PPR protein with a defined phenotype and molecular function in RNA splicing (7), we focused our studies on this subfamily of PPR proteins. Both THA8 proteins from maize and A. thaliana are highly conserved, with 75% sequence identify to one other (Fig. 1A). Sequence analysis indicated that there is a THA8L protein (gene locus At3G46870) in A. thaliana, with 25% sequence identity to THA8. All three PPR proteins share a similar arrangement in their secondary structural elements, indicating that they may function in a similar way in RNA recognition.
The above three proteins were expressed in Escherichia coli and purified for crystallization. Only THA8L (residues 60 -257) produced high quality crystals that diffracted to 2.0 Å resolution. The THA8L structure was solved by the single-wavelength anomalous diffraction method using crystals grown from SeMet-substituted protein. The THA8L structure contains 11 ␣-helices; the first 10 helices comprise five PPR motifs, and the final short helix helps to cap the fifth PPR motif (Fig. 1B). The first PPR motif is L-type, followed by three P-type motifs and a shorter S-type motif that has 31 amino acids (Fig. 1B; see a more detailed explanation in Fig. 6A). The overall arrangement of the five-and-one-half PPR motif is a relatively straight rectangle box (Fig. 1B). The structure also reveals that one side of THA8L FIGURE 1. Structure of Arabidopsis THA8L. A, sequence alignment of Zea mays THA8 (ZeTHA8), AtTHA8, and AtTHA8L. The secondary structure elements are indicated below the sequences. Chloroplast signal peptides are excluded from the sequences. The alignment was done by ClustalW. B, two 90°views of the THA8L structure. Note that the THA8L structure contains 11 ␣-helices; the first 10 helices comprise five PPR motifs, and the final short helix helps to cap the fifth PPR motif. The first PPR motif is L-type, followed by three P-type motifs and a shorter S-type motif that has 31 amino acids. C, two views of THA8L surface charge potential with blue for positive charges, red for negative charges, and white for neutral surface.
has much higher surface positive charge potential than the other side of THA8L, which may be involved in RNA binding (Fig. 1C).
THA8L Binds to Specific RNA Sequences in ycf3 Intron 2-It has been reported that ycf3 intron 2 contains three THA8-binding sites within fragments 1a, 2, and 4, each having a length of ϳ150 nucleotides (7), but the exact binding sequences were not defined. We reasoned that the corresponding THA8-binding sites in ycf3 intron 2 should be relatively conserved in maize, rice, Arabidopsis, and Glycine max. Sequence alignment of ycf3 intron 2 from these species indeed revealed conserved segments enriched with GAA and UU sequences ( Fig. 2A).
To determine whether these conserved segments are the binding sites of THA8/THA8L, we developed an AlphaScreen assay to detect the PPR protein-RNA interactions, as diagramed in Fig. 2B. In this assay, 5Ј-biotinylated RNA was attached to donor beads, and His 6 -tagged THA8 protein was attached to acceptor beads. When the donor and acceptors are brought into proximity by the interaction between THA8 and RNA, illuminat-ing the sample with a 620-nm laser will cause a singlet oxygen transfer from donor beads to acceptor beads and elicit a strong emission of lights at a shorter wavelength (ϳ520 nm). As shown in Fig. 2C, the Zea 1a RNA fragment, a 21-nucleotide fragment that encompasses the GAA conserved sequence, elicited a strong binding signal with THA8L protein at a concentration of 12.5 or 50 nM. In contrast, THA8L showed little interaction with the control RNAs of PB7, HB9, and HCB, which are the corresponding RNA-binding sites for two unrelated PPR proteins, PPR10 (PB7) and HCF152 (HB9 and HCB) (24,25), even when these control RNAs were presented at a concentration of 200 nM, indicating specific binding of THA8L to the Zea 1a RNA fragment. Similarly, THA8 proteins from maize and Arabidopsis also showed strong interactions with the Zea 1a RNA fragment (Fig. 2D). In addition, the gel mobility shift assay confirmed the specificity of THA8L-Zea 4 binding, whereas binding between THA8L and control RNA could not be detected (Fig. 2E). Because our crystal structure is the THA8L protein, we focused biochemical and mutagenesis studies on this protein. . Zea 1a , Zea 2 , and Zea 4 are three putative binding sites of THA8 in the ycf3 intron 2. B, diagram of AlphaScreen assays for detecting THA8L-RNA binding. C, THA8L-RNA binding was detected by AlphaScreen assay. Significant binding signals were detected with 12.5 and 50 nM biotin-Zea 1a RNA and 100 nM His 6 -AtTH8L. HB9, PB7, and HCB (control biotin-RNAs that bind other PPR proteins) showed weak or no binding to His 6 -THA8L. All RNA sequences used for binding assays are listed below. D, RNA binding assay of THA8 and AtTHA8 with biotin-Zea 1a RNA at a ratio of 1:1 showed that THA8 has the highest affinity for biotin-Zea 1a RNA. AtTHA8 showed a significant RNA binding signal at a concentration of 80 nM, whereas THA8 showed significant binding to biotin-Zea 1a RNA at concentrations of 12.5 and 20 nM. His 6 -ZeTHA8, His-Z. mays THA8. E, gel mobility shift assay was performed to detect RNA binding by THA8L protein. Increasing amounts of THA8L (0, 1, 4, and 8 M) were incubated with 1 ng of labeled Zea 4 RNA or HB9 (control RNA). Protein-RNA complex formation was detected for Zea 4 RNA, but not for HB9 RNA (negative control).
Defining the Minimum THA8L-binding Site-Considering that THA8L has just five-and-a-half PPR motifs, we reasoned that only a subset of the 21 nucleotides in the Zea 1a RNA fragment might be required for THA8L binding based on the proposed model of one PPR motif/one nucleotide (26). To determine what the minimum sequence required for THA8L binding is, we synthesized nine RNA oligonucleotides with progressive 5Ј-and 3Ј-truncations of the original 21 nucleotides (Fig. 3). These RNAs were made without a biotin tag and were used to compete for the interaction between biotin-Zea 1a RNA and His-THA8L. As shown in Fig. 3, RNA-1 to RNA-7 efficiently competed for the binding between biotin-Zea 1a RNA and His-THA8L, whereas RNA-8 and RNA-9 greatly lost the competition ability, indicating that 11 nucleotides (RNA-7, AGGAAAUUUUC) represent the minimum length for binding THA8L. The difference between RNA-7 and RNA-8 is a single A at the 5Ј-end, suggesting that this A base may play a key role in THA8L protein-RNA interaction.
RNA-5 with 13 nucleotides appeared to be a better competitor than RNA-7. We thus made the corresponding 13-nucleotide RNAs from rice, Arabidopsis, and G. max to assess which RNAs have the best binding affinity for THA8L ( Fig. 2A). We also made the two putative 13-mer RNAs from fragments 2 and 4 in ycf3 intron 2 ( Fig. 2A). The relative affinities of these RNAs for THA8L were determined by their ability to compete with the binding between biotin-Zea 1a RNA and His-THA8L using relatively low concentrations of competitor RNAs (500 nM) (Fig. 4A). Based on these data, the order of affinity of these RNAs for THA8L is as follows: Zea 4 Ͼ Oryza sativa Ͼ Zea 1a Ͼ Ͼ Zea 2 Ͼ G. max ϭ A. thaliana. It thus appears that Zea 4 with the sequence AAGAAGAAAUUGG has the best binding affinity for THA8L with an IC 50 value of 8 nM (Fig. 4C).
THA8L Specifically Recognizes ssRNA-PPR proteins are known to be RNA-binding proteins, but it is not known whether they prefer ssRNAs, hairpin RNA, or duplex RNA. To assess the binding preference of THA8L, we made Zea 4 ssRNA  and its corresponding dsRNA as well as ssDNA and dsDNA. As shown in Fig. 4 (B and C), THA8L strongly preferred ssRNA over dsRNA or RNA/DNA duplex. THA8L did not bind to ssDNA or dsDNA, suggesting that the additional 2Ј-hydroxyl group in the ribose of RNA is important for the binding of THA8L. The strong preference of THA8L for ssRNA is consistent with the proposed function of the PPR motif in RNA binding and splicing.
Charge Interactions Are Important for RNA Binding by THA8L-We reasoned that the relatively high affinity binding of Zea 4 RNA by THA8L must involve extensive charge interactions between the positively charged residues of PPR motifs and the negatively charged phosphate backbone of RNA. To map these positively charged residues, we systematically mutated all arginines and lysines on the surface of the THA8L structure to the negatively charged glutamic acid. All of these mutated proteins were expressed and purified similarly as the wild-type protein for AlphaScreen RNA binding assays (Fig. 5A). 12 of the 19 charge reverse mutations showed reduction of RNA binding, and mutations of Lys-75, Arg-104, Lys-115, and Arg-119 nearly abolished RNA binding (Fig. 5A). All of these mutations are located on the same side of the surface as the predicted RNA code-determining residues of THA8L protein (Fig. 6B). All of mutant proteins were expressed and purified well (Fig. 5B), and analytic HPLC gel filtration revealed that they were eluted with nearly identical profiles as the wild-type protein (Fig. 5C), indicating that their loss of RNA-binding function is not due to misfolding of proteins.
THA8L RNA-binding Codes-It has been proposed that PPR tracts bind specific RNA nucleotides via the combinatorial action of two amino acids in each repeat (1,13). The combinatorial amino acid code for nucleotide recognition by P-type PPR motifs was proposed to be as follows: T 6 D 1Ј ϭ G, T/S 6 N 1Ј ϭ A, N 6 D 1Ј ϭ U, and N 6 N/S 1Ј ϭ C, where residue 6 is the sixth residue of the first PPR motif and residue 1Ј is the first residue of the next PPR motif. The S-type PPR motif may have a similar amino acid code, but there is no code proposed for L-type PPR motifs (1,13). Sequence alignment of the five PPR motifs of THA8L indicated that motifs 2, 3, and 4 are P-type, whereas motifs 1 and 5 are L-and S-type, respectively (motif 1 does not B, SDS-PAGE of the purified protein samples of all 19 mutant THA8L proteins used for RNA binding assay. C, analytic HPLC gel filtration confirmed that all of the mutant proteins were properly folded. No aggregation/misfolding signals could be detected, and the main peaks of wild-type THA8L and mutants K75E, R104E, and R119E appeared at ϳ5.5 min. mAU, milli-absorbance units. have the C-terminal conserved proline, and motif 5 contains 31 amino acids with a short three-turn helix) (Fig. 6A). On the basis of the proposed amino acid codes, we predicted that motif 2 with an Ala/Asp combination at position 6/1Ј prefers G; motif 3 with a Lys/Asp combination prefers G/A; motif 4 with a Thr/ Glu combination prefers A; and motif 5, an S-type motif with an Arg/Glu combination, prefers G/A. Our assignment of the binding code is consistent with the nucleotide preference of G/A-rich sequences by THA8L ( Fig. 2A). As shown in Fig. 6A, the "coding" amino acids of THA8L are located on top of the positively charged residues whose mutations affect RNA binding. We propose a model with specific base contacts on top of the PPR motif and the corresponding phosphate backbone contact near the bottom of the PPR motifs (Fig. 6B).
Concluding Remarks-In this study, we solved the crystal structure of THA8L, a small protein that contains five-andone-half PPR motifs. We also identified the small RNA-binding site for THA8L and demonstrated that THA8L has a strong preference for ssRNA over ssDNA or dsRNA. The THA8L structure reveals that PPR motifs adopt helix-turn-helix repeats that are packed into a relatively rigid rectangle structure (Fig.  1B). Analysis of amino acid codes for RNA recognition suggested that THA8L prefers a short G/A-rich sequence. On the basis of the location of amino acid codes and distribution of positively charged surface, we proposed a THA8L RNA-binding model with specific base contacts near the top of PPR motifs and the corresponding phosphate backbone contacts near the bottom of the PPR motifs. Validation of such a model requires the structure of a THA8L-RNA complex, which we have failed to obtain despite extensive screening of crystallization conditions with various RNA fragments. Nevertheless, the THA8L structure and the identification of its RNA-binding site as reported here provide important insights into THA8L-RNA interactions and establish THA8L as a sequence-specific RNAbinding protein that might have similar function as THA8 in RNA processing.