Novel Selenoproteins Identified in Silico andin Vivo by Using a Conserved RNA Structural Motif*

Selenocysteine is incorporated into selenoproteins by an in-frame UGA codon whose readthrough requires the selenocysteine insertion sequence (SECIS), a conserved hairpin in the 3′-untranslated region of eukaryotic selenoprotein mRNAs. To identify new selenoproteins, we developed a strategy that obviates the need for prior amino acid sequence information. A computational screen was used to scan nucleotide sequence data bases for sequences presenting a potential SECIS secondary structure. The computer-selected hairpins were then assayed in vivo for their functional capacities, and the cDNAs corresponding to the SECIS winners were identified. Four of them encoded novel selenoproteins as confirmed byin vivo experiments. Among these, SelZf1 and SelZf2 share a common domain with mitochondrial thioredoxin reductase-2. The three proteins, however, possess distinct N-terminal domains. We found that another protein, SelX, displays sequence similarity to a protein involved in bacterial pilus formation. For the first time, four novel selenoproteins were discovered based on a computational screen for the RNA hairpin directing selenocysteine incorporation.

Selenium is an essential trace element whose deficiency can interfere with normal embryonic development and fertility or favor the appearance of certain cancers and viral diseases such as human immunodeficiency virus and coxsackievirus (1). The amino acid selenocysteine is the major biological form of selenium in bacteria and animals. It is found in the active site of selenoproteins and is directly involved in the catalytic reaction. In this regard, the capacity of the selenocysteine selenol group to become ionized at physiological pH, the cysteine thiol group requiring a higher pH, accounts for the higher rate of catalysis of selenoenzymes (2). Seven selenoprotein families have been characterized so far in mammals (3): the glutathione peroxidase and thioredoxin reductase families, involved in scavenging reactive oxygen species and maintaining the redox status of the cell; three iodothyronine deiodinases participating in the thyroid hormone metabolism; and last, SelW and SelP, which have not been attributed a function yet. More recently, a 15-kDa selenoprotein of unknown function has been purified (4). Selenophosphate synthetase-2, the seventh selenoprotein, is remarkable in that it contains selenocysteine, but is also a key actor in the biosynthesis of this amino acid (5).
Selenocysteine is encoded by an in-frame UGA codon, implying the existence of a mechanism capable of distinguishing the UGA selenocysteine codon from a translational stop. This process requires, in eukaryotes, the presence of the selenocysteine insertion sequence (SECIS), 1 a hairpin residing in the 3Ј-untranslated region of selenoprotein mRNAs that is essential for readthrough of the UGA selenocysteine codon (6). Sequence comparisons and structure-function experiments generated a consensus secondary structure model for the SECIS element in which a functional motif could be identified (7,8).
Compelling evidence for the existence of molecular links between selenium deficiencies and biological disorders came from molecular genetics experiments. Targeted disruption of the mouse selenocysteine tRNA gene led to early embryonic lethality, implying that selenoprotein synthesis is essential to mammals (9). Studies carried out on knockout mice lacking the glutathione peroxidase underlined the protective role of selenium against free radicals (10) or coxsackievirus-induced myocarditis in Keshan disease (1). Further supporting the biological importance of this trace element, selenium labeling experiments in rats determined the existence of more selenoproteins to be identified and characterized (11). To undertake this task, we intended here to exploit the mine of information stored in EST data bases. The central question in such a project is how the relevant cDNAs can be retrieved without the knowledge of even a partial protein sequence. To circumvent the obstacle, a strategy was developed based on the absolute requirement of a SECIS element for selenoprotein translation. The finding of such a hairpin in a cDNA should therefore signal the presence of an attached coding sequence. Two assets were exploited to extract new SECIS elements from EST data bases. The first one was the detailed knowledge of the secondary structure of the SECIS element, which is conserved in all known selenoprotein mRNAs. The second one was the utilization of a program capable of detecting potential RNA secondary structures in nucleotide sequence data bases. Combined with molecular biology and in vivo experiments, this approach led to the discovery of four novel selenoproteins using a single RNA element as a structural tag.

EXPERIMENTAL PROCEDURES
Computational Screen and Sequence Comparisons-The search for new SECIS elements was conducted in GenBank TM , sequence-tagged site, and EST data bases with the RNAMOT pattern search program (12,13) with the descriptor shown in Fig. 1A. 600,300 3Ј-and 5Ј-ESTs were scanned, representing a total of ϳ222 ϫ 10 6 nucleotides. 2 Positive hits were aligned with ClustalW (14). The same descriptor run against a randomized sequence of 10 7 nucleotides (A, T, G, and C frequencies, 25% each) yielded three hits. ORFs and ESTs were identified by BLAST searches (15) in the GenBank TM and EST data bases and aligned with ClustalW.
Cloning of the New SECIS Elements-The new SECIS elements were obtained by standard PCR amplification of a human B cell library or of human or mouse genomic DNAs (gifts of S. Elledge, J. L. Mandel, and F. Guillemot, respectively) with oligonucleotides GGGTGATCAGGGG-T(N) 24 and CGGGGTACCTGGAT(N) 24 as the 5Ј-and 3Ј-primers, respectively. (N) 24 corresponds to 24 nucleotides complementary to the SECIS sequence, including the top 4 base pairs of helix I (see Fig. 1A). SECIS AA109465 was constructed by nested PCR. The PCR primers introduced a BclI site at the 5Ј-end and a KpnI at the 3Ј-end of the SECIS elements in addition to a 4-bp stem below helix I (see Fig. 1B). To replace the naturally occurring SECIS element in the glutathione peroxidase reporter, the SECIS candidates were introduced in pGHA-BcK at the BclI-KpnI sites (8). This plasmid encodes a triple-HA tag fused in-frame to the N terminus of the glutathione peroxidase coding sequence (8).
Identification and Cloning of the cDNAs Encoding the Novel Selenoproteins-ESTs corresponding to the functional SECIS elements were identified by querying EST data bases with BLASTN at NCBI. Sequences were aligned with the CAP program (16), producing a contiguous sequence. GenBank TM accession numbers AA180412, AA057045, H44779, and R44842, corresponding to the longest cDNA clones identified for SelN, SelX, SelY, and SelZ, respectively, were purchased from Genome Systems. Longer cDNAs, AF007144 for SelY and R47273 for SelN, were kindly provided by W. Yu and M. M. Y. Waye.
A 1333-bp cDNA fragment corresponding to SelX was identified by screening a HeLa oligo(dT) library (a gift of P. Chambon) with a probe spanning positions 1-197 of AA057045. This EcoRI-XhoI fragment in pBluescript KS was called pSelX. For SelN, the sequence alignment showed that cDNA R47273 overlapped the 5Ј-most 578 bp of AA180412. R47273 and AA180412 were entirely sequenced and fused by ligation of the 702-bp XbaI fragment of R47273 to XbaI-digested AA180412, yielding pSelN2, a 2742-bp EcoRI-XhoI fragment in pBluescript SK. Another fragment of 2066 bp, overlapping the 1544 bp 5Ј to pSelN2, was obtained by screening a HeLa random-primed library (a gift of P. Chambon) with a probe complementary to positions 1-702 of the R47273 XbaI fragment, giving rise to plasmid pSelN3. The 1543-bp XbaI fragment of pSelN3 was inserted into the XbaI-digested plasmid containing AA180412, generating pSelN4. Additional 5Ј-sequences of pSelN4 were obtained by 5Ј-Marathon RACE using the human prostate Marathon-Ready cDNA and the Advantage cDNA PCR kit (CLONTECH). The PCR fragment obtained was digested by NotI-EheI, and the resulting 999-bp fragment was ligated to the NotI-EheI-digested pSelN4 plasmid, yielding pSelN, a 3955-bp NotI-XhoI fragment in pBluescript SK. The cDNA R44842 containing the 1505-bp HindIII-NotI fragment in pLafmid BA was entirely sequenced and named pSelZ. Similarly to SelN, additional 5Ј-sequences were obtained by 5Ј-Marathon RACE, giving rise to the 1170-bp (M15) and 1150-bp (M19) PCR fragments, different in sequence. Into the blunt-ended HindIII-SmaI-digested pSelZ plasmid was inserted either the 1121-bp SmaI fragment from M19 or the 1141-bp SmaI fragment from M15 to generate pSelZf1 (2021 bp) and pSelZf2 (2041 bp) cDNAs, respectively.
cDNA Constructs for in Vivo Expression of SelX, SelN, and SelZ-The cDNAs coding for the different proteins, either with or lacking the SECIS elements, were inserted into the eukaryotic expression vector pXJ41 (a gift of P. Chambon) under the transcriptional control of the cytomegalovirus promoter. A triple-HA tag was fused in-frame to the N termini of SelX, SelN, and SelZ by incorporating, by site-directed mutagenesis, PstI or HindIII sites into pGHA-BcK, downstream of the HA tag sequence, with oligonucleotide GCTCAGTGCGGCCGCTCGTTCT-GCAGTCTGCTGCTCGGCTC or GCTCAGTGCGGCCGCGAAGCTTC-TGCTGCTCGGCTC (restriction sites underlined), generating constructs pGHA-BcKϩPstI and pGHA-BcKϩHindIII, respectively. PstI-StuI digestions of pSelX generated the 1040-bp PstI-StuI fragment that was ligated to the blunt-ended PstI-BglII-digested pGHA-BcKϩPstI plasmid to produce pHASelX. Ligation of the 739-bp blunt-ended PstI-HindIII fragment from pSelX to the blunt-ended PstI-BglII-digested pGHA-BcKϩPstI plasmid generated pHASelX⌬SECIS. The 139-bp EcoRI-HindIII fragment from pGHA-BcKϩHindIII was ligated to EcoRI-HindIII-digested pXJ41, resulting in construct pXJ(HA) 3 . A Hin-dIII restriction site was introduced into pSelN by site-directed mutagenesis with oligonucleotide CGGCCGCCCGGGCAAGCTTACAT-CAGCCC (HindIII site underlined), with the last T of the site corresponding to the first base of the first codon identified in SelN (position 2 in SelN), yielding pSelNϩHindIII. HindIII-KpnI digestion of pSelNϩHindIII generated a 3973-bp fragment that was inserted into HindIII-KpnI-cleaved pXJ(HA) 3 to generate pHASelN. The 2314-bp blunt-ended HindIII-NheI fragment from pSelNϩHindIII was inserted into the blunt-ended HindIII-KpnI-cleaved pXJ(HA) 3 vector to generate pHASelN⌬SECIS. A BamHI site was introduced by site-directed mutagenesis into pSelZ with oligonucleotide GGCCTGCAGGGATC-CCGCTTACCCTC or GCGGCCGCAGGAATGGATCCTCTTTATTTGC-ATTGC (BamHI sites underlined) at either position 1137 (3Ј adjacent to the TAA stop codon) or 1469 (13 bp upstream of the poly(A) tail), respectively. This gave rise to constructs pSelZ-Bamsh and pSelZ-Bamlg, respectively. The 1140-and 1471-bp HindIII-BamHI fragments, arising from HindIII-BamHI digestions of pSelZ-Bamsh and pSelZ-Bamlg, were subcloned into the HindIII-BglII-digested pXJ(HA) 3 vector, giving rise to pHASelZ and pHASelZ⌬SECIS, respectively.
Transfection of COS-7 Cells, 75 Se Labeling, and Glutathione Peroxidase Assays-COS-7 cells were cultured in Dulbecco's modified Eagle's medium supplemented with 10% fetal bovine serum, 2 mM L-glutamine, and 0.1 mg/ml gentamycin according to standard cell culture procedures. Transient transfections were carried out by calcium phosphate precipitation as described (8), with 5 g of test DNA, 4 g of selenocysteine tRNA expression vector, and 1 g of plasmid LacZ-cytomegalovirus as the transfection standard. Sodium selenite (10 nM) was added to the culture medium. Cells were washed after 16 h and harvested 24 h later by scraping. Lysis was carried out by the freeze-thaw procedure in 50 l of 100 mM Tris-HCl (pH 8). For protein analysis, the lysis buffer was adjusted to 20 mM HEPES-NaOH (pH 7.9), 12.5 mM MgCl 2 , 150 mM KCl, 0.1 mM EDTA, 10% glycerol, and 0.5% Tween and further incubated on ice for 20 min. The crude cell extract was then centrifuged at 4°C for 3 min at 13,000 ϫ g to remove cell debris. The supernatant was used for subsequent analysis. For 75 Se labeling, 6 Ci of Na 2 75 SeO 3 (2.5 Ci/g selenium; University of Missouri Research Reactor) were added to each 100-mm plate 24 h after transfection of the plasmids. Cells were further incubated for 20 h before harvesting. Lysis was as described above.
Western blot analysis, normalized to ␤-galactosidase activities, was performed as described (8). For the glutathione peroxidase (GPx) activity assays (8), the HA tag was removed by NotI digestion followed by self-ligation. Prior to GPx activity measurements, ␤-galactosidase activities were assayed with 5 l of crude cell extract to normalize the results. Assays were performed in triplicate.
Immunoprecipitations-The HA-tagged proteins were immunoprecipitated by incubating 25 l of lysis supernatant with 30 l of anti-HA antibody 12CA5 linked to protein A-Sepharose beads in a total volume of 250 l of lysis buffer for 1 h at room temperature. The beads were spun down, washed four times in 200 l of lysis buffer for 15 min, mixed with 20 l of loading buffer (100 mM Tris-HCl (pH 6.8), 150 mM dithioerythritol, 4% SDS, 20% glycerol, and 0.2% bromphenol blue), heated in boiling water for 3 min, and centrifuged.

A Computational Screen for New SECIS Elements-To scan
for sequences that could adopt secondary structures similar to the SECIS element, we developed a computational screen based on the pattern search program RNAMOT (12,13). An input primary/secondary structure descriptor (Fig. 1A) for RNAMOT was inferred from sequence comparisons and the SECIS consensus structure experimentally determined at the time of the search (7,8). To test the validity of the descriptor, RNAMOT was run against the GenBank TM non-redundant data base (10 9 nucleotides at the time), generating 34 different SECIS elements belonging to the then known selenoprotein mRNAs. An additional hit (M35391 in Fig. 1C) was found in an intron of the human procollagen ␣2 chain gene. Given its localization, it is not likely to represent a bona fide SECIS element. However, it was retained because it contained all the features of the SECIS consensus structure. Also, a search with an alternative descriptor carrying N instead of B at the top base pair of the non-Watson-Crick quartet led to the discovery of a SECIS element in the 3Ј-UTR of the selenophosphate synthetase-2 cDNA. This cDNA was characterized earlier, but no SECIS element could be found by the authors (5).
In a second step, the search was conducted in the Gen-Bank TM EST data base (222 ϫ 10 6 nucleotides). After discarding ambiguous hits containing one or more undefined nucleotides, RNAMOT found 376 sequences, including 153 mouse, 101 human, 92 Brugia malayi, and 30 other animal and plant ESTs. A sequence alignment was performed with ClustalW (14), and we plotted the derived neighbor-joining tree to obtain a clustered representation of the matches. This identified 62 individual sequences that could be classified into three families. One family comprised sequences corresponding to the known SECIS elements; another contained groups of unknown SECIS; and the last one contained orphans represented by one or two ESTs only. RNA sequences carrying several AU or GC repeats, prone to adopt alternative secondary structures by changes in the base pairing register, were rejected because of their low biological significance. The remaining elements were assessed in terms of stabilities. Some sequences were discarded based on their low thermodynamic stabilities due to too many consecutive G⅐U base pairs, either 5Ј to the non-Watson-Crick quartet or below the apical loop. Seventeen SECIS candidates were eventually obtained that met the requirements imposed by the different subscreens and presented the features of the SECIS consensus element. For easy correspondence with the EST sequences, the SECIS elements were called by the accession number of one of their parental ESTs (Fig. 1C). All the sequences belonged to human or mouse ESTs, except AA109465, which was a member of a family of 92 B. malayi ESTs. Running the program against the GenBank TM sequencetagged site data base generated nine sequences. Only four of them, accession numbers L18002, Z16689, Z74617, and Z75892 (Fig. 1C), were successful in the subsequent screens. The last two corresponded to the R16491 and R23284 ESTs characterized in the GenBank TM EST data base search. After this first round of selection, 21 SECIS candidates were obtained, comprising 2 cDNAs, 17 ESTs, and 2 sequence-tagged sites.
Functional Assays of the Selected SECIS Candidates-The 21 SECIS candidates were then tested for in vivo function. The SECIS DNAs were obtained by PCR amplification of genomic DNA or cDNA libraries. Concomitant with the PCR amplification and due to the uneven stability of helix I in the different SECIS elements, an identical 4-bp stem was added below helix I in all SECIS elements (Fig. 1B) in order for the SECIS RNAs to exhibit similar stabilities. GPx being a selenoprotein, its translation requires a functional SECIS element in the 3Ј-UTR of its mRNA. The SECIS DNA candidates were then introduced separately into the 3Ј-UTR of a GPx cDNA reporter to replace the residing SECIS element. In this construct, the GPx coding sequence carries an HA tag fused in frame at the N terminus to allow detection of the translated proteins with the anti-HA antibody. Whether or not the SECIS candidates were active could be apprehended by a rapid assay involving COS-7 transfections of the constructs, followed by Western blotting experiments. A functional SECIS candidate should lead to translation of a full-length GPx. In contrast, with an inactive SECIS element, the UGA selenocysteine codon will be recognized as a stop codon, leading to translation of a shortened 9.5-kDa polypeptide. Translation of the mRNA coding for the HAtagged GPx, carrying its own SECIS, generated a product of ϳ27 kDa (Fig. 2A, lane 2). Construct GPx-mutSECIS had the G⅐A/A⅐G to A⅐G/G⅐A substitution in the non-Watson-Crick quar- tet of the SECIS element that impaired its function (8), providing here also minute amounts of GPx (compare lanes 2 and 3). This construct provided the background level. Consistent with earlier observations (8), no 9.5-kDa protein appeared with GPx-mutSECIS, presumably due to the instability of such an unnatural short polypeptide in vivo. Fig. 2A shows that, among the 21 SECIS tested, only R71722, AA057045, selenophosphate synthetase-2, AA107841, R46598, and R44842 could mediate production of a full-length GPx with an efficiency comparable to that of the authentic GPx SECIS (compare with lanes 2 and 3). A seventh element, AA280511 (lane 13), also produced fulllength GPx, but with a lower efficiency.
Since the active site of GPx contains an essential selenocysteine, measuring the enzymatic activity will attest that this amino acid was effectively incorporated into the protein. After transfection into COS-7 cells of the cDNA constructs carrying the SECIS candidates, GPx activities were assayed from crude cell extracts and compared with that of wild-type GPx (Fig. 2B,  bar 2). As anticipated, no significant activity emanated from GPx-mutSECIS (bar 3). Wild-type or slightly higher than wildtype activities were observed with R71722 (105%), AA057045 (110%), and R46598 (ϳ100%). AA107841, selenophosphate synthetase-2, and R44842 retained 80, 73.5, and 54% of the wild-type activity, respectively. The activity dropped to 20% with AA280511 (bar 13). A correlation between both approaches could be thus established, showing that those SECIS candidates producing full-length GPx also conferred wild-type or significant GPx activity.
Synthesis of a full-length, enzymatically active GPx could be obtained with seven of the selected SECIS elements, indicating that they were capable of promoting selenocysteine insertion. Possible explanations for the inactivity of the other candidates will be discussed.
Identification of the cDNAs Harboring the New Functional SECIS Element-In the previous assay, we functionally characterized the selenophosphate synthetase-2 SECIS element of the selenophosphate synthetase mRNA. Next, we sought the open reading frames lying upstream of the remaining new SECIS elements. ESTs physically linked to each SECIS element were searched in the GenBank TM EST data base with BLASTN. The EST sequences collected after an iterative BLASTN search were processed with the CAP program (16) to assemble one contiguous cDNA sequence. The longest cDNAs were obtained and sequenced. The sequence of the cDNA that we found linked to SECIS AA280511 revealed that the SECIS element resides in fact on the opposite strand relative to the putative ORF. Yet constituting a potential bona fide SECIS element, we could not identify an ORF in the proper orientation. We found that SECIS elements AA107841 and R46598 corresponded to the SECIS elements of selenoprotein mRNAs characterized while our study was underway. Indeed, the sequence of the cDNA linked to SECIS AA107841 was found to be identical to that of the 15-kDa selenoprotein (4). The length of the mRNA bearing SECIS R46598, which we call SelY, was estimated to be 6 kilobases by Northern blot analysis (Fig. 3,  lanes 7 and 8). This size suggested that it could correspond to the mRNA of type 2 iodothyronine deiodinase, whose coding frame, deprived of the 3Ј-UTR, was isolated earlier (17). Our cloning and sequencing of SelY cDNA showed that it was identical to the 3Ј-UTR of type 2 iodothyronine deiodinase (18).
Since translation of the cDNA sequences linked to the remaining three SECIS elements, R71722, AA057045, and R44842, showed no homology to known selenoproteins, the cDNAs were termed SelN, SelX, and SelZ, respectively. The sizes of the SelN, SelX, and SelZ mRNAs were estimated by Northern blot analysis to be 4.5, 1.4, and 2.2 kilobases, respectively (Fig. 3). By screening a HeLa oligo(dT) library with a probe complementary to the SelX SECIS DNA, we identified a 1333-bp fragment presumably corresponding to the full-size SelX cDNA. The sequence analysis revealed the existence of a 345-bp-long ORF with an in-frame TGA codon at position 379 (Fig. 4). As expected for a selenoprotein mRNA, its SECIS element effectively resides within the 3Ј-UTR. Querying EST data bases with BLAST identified a 2231-bp cDNA that was incomplete since the corresponding mRNA was 4.5 kilobases long (Fig. 3). Upstream sequences were thus obtained by screening a HeLa random-primed cDNA library and 5Ј-Marathon RACE, extending them by 1718 bp. Assembled together, the fragments gave rise to a 3949-bp SelN cDNA, the sequence of which indicated that the reading frame was still open. However, as the 3949-bp SelN cDNA contained a 1414-bp ORF with a characteristic in-frame TGA codon at position 1028, it was used for subsequent analysis. Here also, the SECIS element occurred within the 3Ј-UTR of the SelN cDNA (Fig. 4).
The sequencing of the EST corresponding to SECIS R44842 determined the presence of a 1505-bp cDNA that contained an ORF that obviously extended upstream of the characterized sequence. This cDNA was called SelZ. Additional 5Ј-sequences were searched by 5Ј-Marathon RACE. Surprisingly, we obtained two different PCR fragments with different 5Ј-sequences. Each fragment obtained, added separately to the SelZ cDNA, generated the 2021-bp SelZf1 and 2041-bp SelZf2 cDNAs. The 5Ј-sequences of these cDNAs differ upstream of positions 520 in SelZf1 and 540 in SelZf2 and are followed by the common SelZ region (Fig. 4). Since the corresponding transcripts are approximately the same size, they could not be distinguished by Northern blot analysis with a probe complementary to the common SelZ sequence (Fig. 3). Putative ATG initiation codons were identified by the presence of upstream sequences homologous to the Kozak consensus sequence (19) at positions 816 in SelZf1 and 383 in SelZf2. A TGA codon was found in the common region, potentially encoding a selenocysteine at the C-terminal penultimate position in both proteins. For SelZf1 and SelZf2, the SECIS element was localized 250 bp downstream of the putative TAA stop codon (Fig. 4).
Can the New SECIS Elements Mediate Readthrough of the Selenocysteine Codon in Their Own mRNA Contexts?-SelX and SelN were fused at the N terminus to an HA tag, generating constructs HASelX and HASelN, respectively. In SelZf1 and SelZf2, the putative selenocysteine codon resides at the penultimate C-terminal position in a domain common to both proteins. Therefore, only the SelZ common region was epitopetagged at the N terminus, giving rise to HASelZ. After transfection of the constructs into COS-7 cells, the tag allowed immunodetection by the anti-HA antibody of the proteins contained in the cell extracts, hence evaluation of their sizes. In ⌬SECIS constructs, the absence of the SECIS element should convert the UGA selenocysteine to a stop codon, thus producing a shortened polypeptide. Based on the cDNA sequence, HASelX should generate either 17.2-or 15-kDa proteins, according to selenocysteine codon readthrough. Transfection of HASelX indeed generated a major product at ϳ16 kDa, but also a minor one at ϳ10 kDa (Fig. 5A, lane 4), possibly arising from inefficient selenocysteine codon readthrough (6). Construct HASelX⌬SECIS, as anticipated, produced almost exclusively the shortest form (lane 5). Obtaining the faint SECIS-independent 16-kDa band was reminiscent of what happened with GPx (lane 3) and other selenoproteins (5).
A 58-kDa product corresponding to the full-length protein produced by HASelN was expected. Indeed, synthesis of a 60-kDa protein was observed (Fig. 5A, lane 6). Even though a shorter product of 51 kDa showed up both in the presence and absence of the SECIS element (compare lanes 6 and 7), it must be stressed that the expected full-length 60-kDa protein appeared only in the presence of the SECIS element. Since the UGA codon is located at the penultimate position in the SelZ mRNA, we should not expect a difference in the mobilities of the full-length 48-kDa and UGA-terminated proteins. This is effectively what happened (lanes 8 and 9). We concluded from these experiments that the SECIS elements in the SelX and SelN mRNAs function to mediate readthrough of the selenocysteine codon, with the only ambiguity remaining for SelZ.
SelX, SelN, and SelZ Are Selenoproteins-To solve the SelZ ambiguity, but also to assert that the new cDNAs do encode selenoproteins, in vivo labeling was performed by growing transiently transfected COS-7 cells in a medium containing Na 2 75 SeO 3 . The HA-tagged proteins were immunoprecipitated from the cell extracts with the anti-HA antibody and fractionated by SDS-polyacrylamide gel electrophoresis. The immunoprecipitation and the difference in size arising from the tag enabled the specific detection of the recombinant selenoproteins. For SelX, SelN, and SelZ, a 75 Se-labeled product was obtained only with the SECIS-containing cDNAs (Fig. 5B, compare lanes 4 and 5, 6 and 7, and 8 and 9). The positions of the bands correlated with the protein sizes predicted from the cDNA lengths and with those on the Western blot in Fig. 5A. The variable intensities of the bands may be accounted for by differential mRNA or protein stabilities or by different activities carried by different SECIS elements, as previously observed in other contexts (20). In the control experiment, the full-length GPx protein was accompanied by a lower molecular mass product of ϳ22 kDa, which could arise from proteolysis (lane 2). Worth noting is the lack of detection of the full-length GPx, SelX, and SelN proteins that were observed on the Western blots in the absence of SECIS elements (Fig. 5A), even after long exposure (data not shown). It may well be that these selenium-lacking proteins originated from weak unspecific readthrough of the selenocysteine codon under our experimental conditions.
These results conclusively demonstrate that SelX, SelN, and SelZ are indeed selenoproteins. Because SelZ exists in two isoforms, this corresponds to four novel selenoproteins: SelX, SelN, SelZf1, and SelZf2. Since the corresponding cDNAs each contain an in-frame TGA codon and a SECIS element, the selenium labeling experiments strongly argue in favor of specific selenocysteine incorporation.
Searching Functions for the New Selenoproteins-Northern blot analysis was performed to determine possible tissue-specific expression of SelX, SelN, and SelZ (Fig. 3). SelN mRNA was ubiquitously expressed, with, however, a higher accumulation in the pancreas, ovary, prostate, and spleen. The distri- bution of the SelX mRNA was less homogenous than that of SelN, being preponderant in the liver and leukocytes, abundant in the pancreas, but low in the lung, placenta, and brain. SelZ mRNA showed more pronounced accumulation in the kidney, liver, testis, and prostate, but was low in the thymus.
In the course of this study, the cDNA for the selenoprotein TrxR2, a mitochondrion-specific thioredoxin reductase isoform, was cloned independently by several groups (21)(22)(23). Sequence comparisons between the SelZf1, SelZf2, and TrxR2 cDNAs, depicted schematically in Fig. 7A, indicated that they share a large common domain. The SelZf1 and TrxR2 cDNA sequences are identical from the 3Ј-end to residue 636 of TrxR2. In the SelZf2 cDNA, the region conserved with TrxR2 extends up to position 293 of TrxR2. The common region in the three cDNAs includes the 3Ј-part of the coding sequence with the in-frame TGA codon and the 3Ј-UTR, with sequence differences occurring at their 5Ј-ends. The three cDNAs encode three different proteins sharing a common core, but with different N-terminal domains.
Alignment of the human SelN DNA sequence with ESTs or of the SelN protein sequence with translated ESTs revealed the existence of a hypothetical ortholog in mouse and rat. The number of different ESTs was insufficient for reconstitution of complete cDNAs, but the partial assembled sequences showed conservation of the coding frames, in-frame TGA codons, and SECIS elements.
We next sought homologs to SelX. A mouse cDNA covering the entire length of the human SelX cDNA was reconstituted in silico by merging various overlapping mouse ESTs. The translated mouse cDNA showed 91% amino acid identity to the human SelX protein. Furthermore, data base searches found SelX sequence similarities to plant and Drosophila translated ESTs, but also to prokaryotic, yeast, and Caenorhabditis elegans ORFs indexed as hypothetical proteins of unknown function. Displayed in Fig. 6, these findings show striking amino acid identities between, for example, human SelX and Escherichia coli P39903 (24%), C. elegans P34436 (28%), and Drosophila EST AA540562 (28%). The comparison also stressed the 29% amino acid identity of the human and mouse SelX proteins to a domain of the Neisseria gonorrhoeae, Hemophilus influenzae, Helicobacter pylori, Mycoplasma capricolum, and Streptococcus pneumoniae PILB proteins, regulators of bacterial pilus formation (24). Although the sequences are similar over their entire lengths, the alignment highlights two blocks of higher sequence conservation: PWPAF (1)ϱGLGHEF (2)

DISCUSSION
The objective of our study was the isolation of new selenoprotein cDNAs. The existence of selenoproteins other than those previously characterized was predicted by workers based on selenium labeling experiments, but did not lead to amino acid sequence data. To circumvent the lack of protein sequence information, we assumed that a number of the desired cDNA sequences were already deposited in the EST data bases. To exploit this information, our strategy took advantage of the obligatory presence of a SECIS element in all selenoprotein mRNAs. This differs from conventional screens in two respects. The SECIS hairpin being characterized more by the high conservation of its secondary structure than by the extent of invariant sequences, alignment methods such as BLAST and FASTA were inappropriate. The originality of our approach was the use of a program capable of detecting RNA foldings such as the SECIS consensus secondary structure. Another and probably the most important aspect of our screen is that selenoprotein cDNAs contain TGA codons, obviously rendering the identification of an ORF more challenging than in other cDNAs where TGA signals the end of the ORF. Notwithstanding, the strategy paid off since the RNA structure alone was sufficient to discover four novel different selenoproteins.
Seven SECIS candidates, out of the 21 selected in silico, indeed corresponded to functional SECIS elements. This came as a surprise since the inactive candidates harbored the features defined by the SECIS consensus structure. Several possibilities can explain this paradoxical situation. The SECIS losers may lack one or more essential sequences or base pairs that could have been unintentionally omitted in the SECIS descriptor because they were not yet identified in the then known SECIS elements. Alternatively, the SECIS losers may contain sequence or base pair anti-determinants preventing them from functioning. Finally, the sequences may fold in vivo into structures slightly different from the expected one.  5, 7, and 9)) into COS-7 cells, the HA-tagged proteins were revealed by Western blot analysis with the anti-HA antibody. Control lanes are the same as described for Fig. 2A. Migrations in lanes 1-5 and 6 -9 were on 10 and 12% gels, respectively. Arrows point to the translation products mentioned under "Results"; asterisks indicate unspecific products. B, SelN, SelX, and SelZ are selenoproteins. Transfected COS-7 cells were cultured in the presence of 75 Se. The HA-tagged 75 Se-labeled proteins were immunoprecipitated, fractionated on a 12% gel, and revealed by autoradiography.
Three SECIS elements among the seven winners led to the discovery of the SelN, SelX, and SelZ selenoprotein mRNAs, SelZ giving rise to the SelZf1 and SelZf2 isoforms. In vivo expression of the selenoprotein mRNAs indicated that selenocysteine incorporation was actually dependent on the presence of the SECIS element. No sequence similar to SelN could be found in protein or nucleotide sequence data bases. However, similarity searches were productive with SelX and SelZ. The amino acid comparisons in Fig. 6 underscored two prominent features of SelX. First, sequences similar to mammalian SelX were detected in all kingdoms. The human and mouse sequences had 24 -28% amino acid identities to ORFs of unknown function in E. coli and C. elegans and in a Drosophila EST. Second, we found that mammalian SelX displayed 29% amino acid identity to a domain of PILB, a protein involved in pilus formation in the bacteria N. gonorrhoeae, H. influenzae, H. pylori, M. capricolum, and S. pneumoniae (Fig. 6). PILB possesses a peptide methionine-sulfoxide reductase activity (25). Sequence comparisons established that this activity resides in a PILB subdomain different from the SelX similarity. The situation differs in E. coli, where the peptide methionine-sulfoxide reductase activity is borne by MsrA, a polypeptide different from P39903, one of the hypothetical proteins identified by similarity to SelX (Fig. 6). From these observations, it looks as if SelX constitutes a functional module, acting per se or associated with peptide methionine-sulfoxide reductase in the bacterial polyprotein PILB. The conserved amino acids in blocks 1 and 2 as well as the selenocysteine (Fig. 6) certainly play important roles in the function of SelX.
The C-terminal domains of SelZf1 and SelZf2 show clear homologies to the corresponding domain of the selenoprotein TrxR2. Interestingly, it was shown that the 293 bp at the 5Ј-end of the TrxR2 cDNA encode the mitochondrial targeting peptide (23), which is not found in SelZf2 (Fig. 7A). More surprisingly, the region of the cDNAs encoding the CVNVGC active site, common to the mitochondrial and cytoplasmic thioredoxin reductases and to the glutathione reductase (22), was found in the SelZf2 cDNA, but not in the SelZf1 cDNA. This suggests for SelZf1 a different function compared with SelZf2 and TrxR2. In the course of searching sequences similar to the SelZf1 and SelZf2 cDNAs, we identified genomic fragments (GenBank TM accession numbers AC000079 and AC000080) with similarity to both cDNAs. An identical genomic fragment was also shown independently by others (23) to contain sequences encoding TrxR2. Alignment of the SelZf1, SelZf2, and TrxR2 cDNA sequences with the genomic sequence yielded the putative assembly pattern in Fig. 7B, obtained by removing the introns. We could see that those domains that differ between the three cDNAs (extending from the 5Ј-ends to positions 521, 198 and 541, and 293 and 636 in SelZf1, SelZf2, and TrxR2, respectively) correspond to distinct genomic segments. The three cDNAs should arise from the same gene, probably by alternative splicing resulting in the addition of different 5Јsegments to a common core to generate three different selenoproteins with specialized functions or localizations.
Previous reports underscored the relevance of computational searches for identifying RNA structure motifs (26 -28). Recently also, a computational screen using an original algorithm was employed to uncover methylation guide small nucleolar RNAs in the yeast genome (29). The peculiarity of our study resides in that the strategy employed led to the discovery of four novel selenoproteins. This once again illustrates the value of mRNA 3Ј-UTRs as a repository of functional RNA motifs instrumental in post-transcriptional control. Undoubtedly, with hundreds of new EST sequences deposited every day in the data bases and in the perspective of the completion of the human genome sequencing project, this strategy will enable more selenoproteins to be discovered. This could also be extended to the discovery in other organisms of mRNAs whose stability or localization is mediated by common structural motifs in the 3Ј-UTR.