New Mammalian Selenocysteine-containing Proteins Identified with an Algorithm That Searches for Selenocysteine Insertion Sequence Elements*

Mammalian selenium-containing proteins identified thus far contain selenium in the form of a selenocysteine residue encoded by UGA. These proteins lack common amino acid sequence motifs, but 3′-untranslated regions of selenoprotein genes contain a common stem-loop structure, selenocysteine insertion sequence (SECIS) element, that is necessary for decoding UGA as selenocysteine rather than a stop signal. We describe here a computer program, SECISearch, that identifies mammalian selenoprotein genes by recognizing SECIS elements on the basis of their primary and secondary structures and free energy requirements. When SECISearch was applied to search human dbEST, two new mammalian selenoproteins, designated SelT and SelR, were identified. We determined their cDNA sequences and expressed them in a monkey cell line as fusion proteins with a green fluorescent protein. Incorporation of selenium into new proteins was confirmed by metabolic labeling with 75Se, and expression of SelT was additionally documented in immunoblot assays. SelT and SelR did not have homology to previously characterized proteins, but their putative homologs were detected in various organisms. SelR homologs were present in every organism characterized by complete genome sequencing. The data suggest applicability of SECISearch for identification of new selenoprotein genes in nucleotide data bases.

described in the nematode, Caenorhabditis elegans (6,7). Interestingly, no selenoprotein genes are present in the yeast genome of Saccharomyces cerevisiae.
Sec-containing proteins are more common in mammals, in which 14 selenoproteins have been found to date (8). These include four types of glutathione peroxidase (9,10), three types of thyroid hormone deiodinase (11,12), three types of thioredoxin reductase (13), selenophosphate synthetase 2 (14), selenoprotein W (15), selenoprotein P (16), and the 15-kDa selenoprotein (17). Thirteen of these proteins contain a single Sec residue that is conserved among mammalian sequences and is often present at the enzyme redox active center. Selenoprotein P, which is the major selenium-containing protein in plasma, is an exception in that it contains 10 -12 selenocysteines residues depending on the host species (16).
The 3Ј-untranslated regions (UTRs) of all mammalian selenoproteins contain a stem-loop structure, designated as the selenocysteine insertion sequence (SECIS) element (18). This element is essential for recognition of a UGA codon within the coding region of selenoprotein mRNA as a signal for Sec incorporation (19). SECIS elements have several conserved features depicted in Fig. 1, A-D. The general structure of the SECIS elements, and in particular the Quartet (also called a SECIS core) of non-Watson-Crick interacting nucleotides and a double A motif in the apical loop are essential for SECIS element function (20 -23).
The presence of a Sec residue in regions that are essential to function of selenoproteins explains many, if not all, of the biological effects of selenium when this micronutrient is present at suboptimal levels in the diet. Selenium deficiency results in decreased levels of selenium-containing proteins (24), and insufficient selenium levels were also associated with a decreased survival rate of HIV-infected patients (25), increased rate of cancer incidence (26), and several other health disorders (8,27). Supplementation of the human diet with selenium offers a potentially effective means of preventing or diminishing human maladies. For example, dietary supplementation with selenium resulted in 48 -63% reduction in the incidence of human prostate, lung, and colon cancers in a human clinical trial (26). The essential role of selenium for mammalian development is illustrated by the findings that disruption of the mouse Sec-tRNA gene results in early embryonic lethality (28). It is not known which selenoprotein(s) is (are) responsible for this effect.
Identification and characterization of new Sec-containing proteins is an important area of research that may take advantage of the available genome sequencing data and help to elucidate many biological effects of selenium. No algorithms are currently available that would correctly predict selenoprotein gene sequence in nucleotide data bases because selenoproteins have diverse functions (8) and lack a common amino acid consensus sequence.
In this report, we describe a computer program, SECISearch, that recognizes SECIS elements in nucleotide sequences. Using this program and additional criteria for recognition of mammalian selenoprotein genes, 2 we found the genes for two new Sec-containing proteins, SelT and SelR, which do not have homology to previously characterized proteins. In addition, we provide experimental evidence for the expression of SelT and SelR in mammalian cells. It should also be noted that SelR is the first selenoprotein that has a direct homolog in a "minimal gene set for cellular life" (30).

EXPERIMENTAL PROCEDURES
SECISearch-A program for SECIS element recognition in nucleotide sequences, SECISearch, is based on the algorithm that involves three steps of a search strategy: 1) primary sequence identification; 2) secondary structure prediction; and 3) minimum free energy estimation ( Fig. 2A). The program is composed of two modules. The first module is responsible for recognition of the SECIS element primary sequence and secondary structure consensuses and is written on the basis of the PatScan program. Nucleotide sequences that satisfy both primary and secondary structure constraints are further analyzed in the second module that estimates the free energy of the stem-loop structure. This module is based on the Vienna RNA package RNAfold program for secondary structure prediction and free energy evaluation. It separately estimates the free energies for Helix I plus internal loop and Helix II plus apical loop regions of the putative SECIS element (Fig. 2B). The free energy cutoff parameters are based on free energy minimization calculations of SECIS elements in previously characterized human selenoprotein mRNAs and were set as Ϫ7 kcal/mol for Helix I and internal loop and Ϫ5 kcal/mol for Helix II and apical loop. In the absence of a reliable algorithm that correctly predicts the three-dimensional structure of mRNA, these numerical values are relative but reflect the relationship between the values of the free energy of folding predicted for SECIS elements in previously characterized selenoproteins and stem-loop structures selected by SECISearch. The minimum free energy algorithm derived from the Vienna RNA package and employed by SECISearch is also present in the popular mfold program (31). The two modules of SECISearch were written in C and compiled by the GNU C compiler for Win32 platform. Perl scripts were used for module interaction and for presentation of the SECISearch data for further analyses.
Computer Searches-Human expressed sequence tag data base (dbEST) (March 1999 release) was searched with SECISearch ( Fig. 2A) with parameters indicated in Table I. SECIS elements in the SelP gene were obvious outrangers in terms of Helix II plus apical loop energy (Fig. 2B). The use of energy cut-off parameters that excluded these SECIS elements allowed us to reduce the number of sequences selected by SECISearch over 6-fold and increase the proportion of sequences that corresponded to known selenoprotein genes from ϳ10 to ϳ70%. Nucleotide sequences that were selected by the program were manually analyzed against NCBI non-redundant (NR) and EST data bases and against the TIGR tentative human consensus data base. Sequences corresponding to the known selenoprotein genes were excluded, remaining sequences were further extended by computer search analyses of dbEST, and consensus sequences were obtained on the basis of multiple matches in all three data bases. Extended sequences were analyzed for the presence of open reading frames (ORFs) allowing in-frame TGA codons to be interpreted as either Sec residues or stop signals. ORFs containing in-frame TGA codons were identified and analyzed by computer search analyses for the presence of features characteristic of mammalian selenoprotein genes. Specifically, conservation of a TGA codon in nucleotide sequences, putative Sec-flanking areas in amino acid sequences and SECIS elements in 3Ј-UTR were tested in mammalian sequences. ORFs were further searched for homologies to NR and EST sequences using BLAST programs, and sequences homologous to putative selenoprotein sequences and containing a cysteine codon in place of a Sec-encoding TGA codon in a putative selenoprotein were identified.
cDNA Sequencing-EST clones (number 590409 from "Stratagene endothelial cell 937223" library, accession number AA156969, and number 736225 from "Soares mouse 3NME12 5" library, accession number AA270410) were obtained from Research Genetics, Inc. Plasmids were isolated using a Nucleobond AX100 Kit (CLONTECH). The nucleotide sequences were determined using the Dye Terminator Cycle Sequencing method.
Constructs with Green Fluorescein Protein (GFP)-GFP-SelT and GFP-SelR constructs were made on the basis of pEGFP-C3 expression vector (CLONTECH). In each of these constructs, GFP was located upstream of a selenoprotein gene. Human SelT cDNA was amplified with primers W5U119, 5Ј-GTGCCAGCAAGAGATTAAAG-3Ј, and W5L791, 5Ј-GCCAGCACTGAAACACTATC-3Ј, and cloned into the SmaI site of pEGFP-C3. The SelR cDNA was directly cloned into the XhoI/BamHI sites of pEGFP-C3. Both plasmids were transformed into E. coli strain NovaBlue (Novagen), and the plasmids were isolated using Plasmid Maxi Kit (Qiagen).
Cell Growth, Transfection, and Metabolic Labeling with 75 Se-A monkey CV-1 cell line was grown on Dulbecco's modified Eagle's medium supplemented with 10% fetal calf serum to ϳ80% confluence and transfection was carried out using LipofectAMINE reagent (Invitrogen) according to the manufacturer's protocol for attached cells. 3 g of DNA and 15 l of LipofectAMINE were used for each 60-mm plate. Cells were labeled between 6 and 36 h after transfection with 100 Ci of freshly neutralized [ 75 Se]selenious acid (1000 Ci/mmol, Research Reactor Facility, University of Missouri, Columbia, MO) as described (17). Cells were washed four times from the remaining 75 Se with phosphatebuffered saline, harvested, and the samples analyzed on 10% SDS-Nu-PAGE gels (Novex). 75 Se-Labeled proteins were visualized on SDS-PAGE gels with a Storm PhosphorImager system (Molecular Dynamics).
Immunoblot Detection-Rabbit polyclonal antibodies were raised against a synthetic polypeptide corresponding to residues 148 -163 of the human SelT protein. This peptide was conjugated to a keyhole limpet hemocyanin before injection into a rabbit. Western blot analyses were performed with an ECL system (Amersham Pharmacia Biotech). (18 -22) are composed of Helices I and II, internal and apical loops, and a non-Watson-Crick base paired SECIS core, Quartet (Fig. 1, A-D). Conserved nucleotides in the SECIS element sequence are an A directly preceding the Quartet, TGA in the 5Ј segment and GA in the 3Ј segment of the Quartet, and an unpaired AA in the apical loop. Another characteristic feature of a mammalian SECIS element is the length of Helix II, which separates the non-Watson-Crick Quartet and AA in the apical loop by 11-12 nucleotides.

SECIS Element Consensus Structure-Mammalian SECIS elements
Two distinct but related models were recently proposed for the SECIS element consensus sequence and structure. Krol and collaborators (20) suggested a single consensus for all mammalian SECIS elements (Fig. 1A), while Berry and collaborators (22) divided SECIS elements into two distinct subfamilies, type I and type II (Fig. 1, B and C). These subfamilies were different in the area of the apical loop; that is, type II SECIS elements had an additional mini-stem that placed AA in the bulge. These two types of SECIS elements were interconvertible by mutations that remove/create the mini-stem suggesting a similar structure and function of both SECIS element types (22). Even though the consensus sequences shown in Fig. 1, A-C, indeed represent SECIS elements in selenoprotein mRNA, we analyzed known SECIS elements from a different perspective and propose a consensus that utilizes the free energy of a stem-loop structure in addition to primary and secondary consensus structures (Fig. 1D). When free energies predicted from a computer analysis of SECIS element folding were analyzed for known SECIS elements, all Sec-inserting stem-loop structures exhibited similar free energy parameters. The free energy values for Helix I plus internal loop and for Helix II plus apical loop were determined separately and plotted against each other (Fig. 2B). No differences were observed in the free energy values for type I and type II SECIS elements shown in Fig. 1, B and C, suggesting that the mini-stem serves to stabilize the stem-loop structure. The mini-stem is formed when an apical loop is large enough to destabilize the structure, but it may not be required for the SECIS elements with smaller apical loops.
These considerations are similar to the suggestion that the mini-stem may serve to maintain thermodynamic stability or to nucleate SECIS element folding (22). The commonality in the energetic criteria for SECIS elements allowed us to maintain a single model, depicted in Fig. 1D, for designing an algorithm that searches for novel SECIS elements. Additional features that distinguish the SECIS element shown in Fig. 1D, from previously proposed models, Fig. 1 (20,21). B, type I SECIS element consensus proposed by Berry and collaborators (19,22). C, type II SECIS element consensus proposed by Berry and collaborators (19,22). D, SECIS element consensus used in SECISearch. E, SECIS element in human SelT mRNA. F, SECIS element in human SelR mRNA. identifying a distance between the unpaired AA and the 3Ј segment of the Quartet; and (c) the use of the unpaired AA instead of AA(A/G) in the apical loop.
SECISearch-A computer program, SECISearch, was developed for identification of SECIS elements in nucleic acid sequences. The algorithm includes three major steps: 1) primary consensus sequence search; 2) analysis of mRNA secondary structure; and 3) estimation of the free energy for the predicted secondary structure. The descriptor was developed on the basis of the SECIS element consensus (Fig. 1D) and adjusted to recognize SECIS elements in known selenoproteins. The free energy values were estimated for known SECIS elements and the lowest negative free energy cutoff parameters were determined (Fig. 2B). The on-line version of SECISearch was also developed and designed to provide researchers with a tool to test nucleotide sequences for the presence of potential SECIS elements. It should be noted, however, that recognition of a SECIS element in a nucleotide sequence by SECISearch does not identify the sequence as a portion of a selenoprotein gene containing a functional SECIS element. Instead, this analysis is a first step in testing nucleotide sequences for the presence of a SECIS element, and further experimental and/or computer analyses are required (see below).
Computer Search-SECISearch was applied to search dbEST and NR for nucleotide sequences containing SECIS elements. The search strategy involving this program and further analysis of ESTs selected by SECISearch is summarized in Fig. 2A. SECISearch was highly selective for authentic SECIS elements. Depending on the parameters applied for a particular search (free energy of Helix I/internal loop, free energy of Helix II/apical loop, length of Helixes I and II, etc.), ϳ10 -70% of the sequences selected by the program in dbEST corresponded to SECIS elements in previously characterized selenoproteins genes. The example of SECISearch analysis of human dbEST is shown in Table I. The primary sequence consensus step reduced the number of tested ESTs to 14.3%, and only 2.6% (32,652 ESTs) of the initial 1,253,123 sequences satisfied the secondary structure consensus. The criteria used in primary sequence and secondary structure searches were sufficient to detect all known selenoproteins in NR. Calculations of the free energy for the predicted stem-loop structures with thermodynamic parameters that satisfy 12 out of 13 selenoproteins with known SECIS elements reduced the number of ESTs to 0.078% (974 individual ESTs). Eleven known selenoproteins were represented in this subset of human ESTs by 678 sequences. The remaining 296 ESTs were further grouped by multiple sequence alignments and each group was manually analyzed with the help of BLAST programs resulting in detection of two new selenoproteins (Fig. 1, E and F) that were represented by 35 ESTs.
However, not all of the SECISearch-selected sequences were derived from selenoprotein genes. This is likely due to a low degree of sequence and structure conservation in mammalian SECIS elements and perhaps due to a limited knowledge of actual SECIS element structure and of additional cis-acting elements involved in Sec insertion into mammalian proteins.
Thus, in addition to functional SECIS elements, non-functional, "pseudo" SECIS elements were found by the current version of SECISearch during analyses of nucleotide data bases. A number of such SECIS-like non-functional structures were immediately recognized due to their presence in 5Ј-UTRs and coding regions (as well as in 3Ј-UTRs if translation was terminated at TAA or TAG) of known proteins. However, pseudo SECIS elements may also be present in the 3Ј-UTR of proteins in which TGA signals terminate translation, but certain properties of mRNA, such as the distance between TGA and the SECIS element, or mRNA tertiary structure, prevent SECIS element from signaling Sec incorporation. It has been established, for instance, that the TGA-SECIS element distance of Ͻ51 nucleotides was insufficient in decoding of TGA as Sec (23), while the distance of Ͼ204 nucleotides was sufficient (32). Analyses of SECISearch-selected ESTs were also occasionally complicated by the uncertainty of the correspondence of an entry to a coding or complementary strand.
However, we found that, even if the majority of SECISearchselected ESTs were classified as SECIS elements in known selenoproteins or as pseudo-SECIS structures, the number of remaining nucleotide sequences was typically too large to test in a cell line system. Indeed, experimental screening of the subset of nucleotide sequences that are selected by SECISearch Nucleotide sequences in dbEST were analyzed with the SECISearch program that subsequently searched for the primary consensus sequence, secondary consensus structure, and the free energy parameters characteristic of SECIS elements in known selenoprotein genes. Selected EST sequences were manually analyzed for the presence of open reading frames that satisfy MSGS criteria. Candidate selenoproteins were experimentally characterized by transfection of a mammalian cell line with a GFP-selenoprotein construct and detecting the fusion protein by 75 Se labeling. B, free energy plot for SECIS elements found in human selenoprotein mRNAs. Free energy values, ⌬G (in kcal/mol), for Helix I plus internal loop and for Helix II plus apical loop were calculated separately for each SECIS element and plotted against each other.
Analysis of putative Sec-flanking areas in cDNAs selected by SECISearch provides such additional criteria for selection of selenoprotein genes. We recently proposed a set of criteria, designated as mammalian selenoprotein gene signature (MSGS), 2 that are helpful for identifying selenoprotein mRNA sequences. According to MSGS, a typical new selenoprotein will not only contain a SECIS element in the 3Ј-UTR of its mRNA, but this SECIS element will also be conserved among mammalian mRNAs for this protein. In addition, Sec and Sec-flanking regions for this protein should be conserved in mammalian amino acid sequences for this protein, and homologous sequences for this protein (most often in lower eukaryotes) should contain a cysteine residue in place of Sec (or these homologous sequences should conserve Sec). These criteria are consistent with the sequences of all known mammalian selenoproteins. It should be noted, however, that selenoprotein P contains both conserved and non-conserved Sec residues, and this protein only satisfies MSGS criteria based on its conserved selenocysteines.
Application of MSGS criteria to the remaining unclassified nucleotide sequences selected by SECISearch allowed us to classify a significant proportion of these sequences as those that are not derived from selenoprotein cDNAs. Still, a number of sequences in dbEST could not be classified, because these sequences were represented by an insufficient number of ESTs, were incomplete or erroneous, or their homologous sequences were not detected in other organisms. Accumulation of sequence data in nucleotide data bases will help us to classify these EST sequences.
Identification of SelT and SelR-Two sequences that were initially selected by SECISearch (Fig. 1, E and F) were represented by a large number of EST sequences, which allowed us to obtain complete human cDNA sequences for these proteins from multiple sequence alignments of ESTs. Analysis of open reading frames present in cDNAs revealed the presence of in-frame TGA codons in the coding regions and the presence of SECIS elements in 3Ј-UTRs. Further protein homology analyses found no homology to previously characterized proteins, but revealed characteristics that satisfied both SECISearch and MSGS criteria, suggesting that these sequences encoded se-lenoproteins. The new proteins were designated as SelT (Fig.  3A) and SelR (Fig. 3B).
SelT-The human EST clone containing SelT cDNA was obtained. Its sequence was experimentally determined and was in agreement with the cDNA consensus sequence obtained by multiple sequence alignments of ESTs. The cDNA sequence of SelT (Fig. 3A) was 1002 nucleotides long and contained an ORF of 163 amino acid residues with a calculated mass of 18.8 kDa. The Sec residue, Sec17, encoded by TGA, was located in the N-terminal portion of the protein. The SECIS element (Fig. 1E) was located 509 nucleotides downstream of the TGA codon for Sec determined as the distance between a SECIS core, the Quartet, and a Sec codon.
Further computer analyses of the SelT amino acid sequence revealed the presence of homologous sequences in other animals (Fig. 4A) as well as in plants. Interestingly, Sec was present in SelT homologs in Schistosoma mansoni and zebrafish, while Cys was present in place of Sec in SelT homologs in Drosophila melanogaster, C. elegans (Fig. 4A), and Arabidopsis thaliana. The SelT region containing Sec had a high degree of homology, and one of the conserved residues, Cys-14, was separated from Sec by two other amino acid residues. This putative redox center, CXXU, was similar to that found in thioredoxins and glutaredoxins (CXXC) (33), selenoprotein W (CXXU) (15), and several other redox active proteins.
Table II, A, shows the incidence of SelT EST clones in human tissues and organs for which at least one library has three or more independent SelT cDNAs in human dbEST. In addition, 18 other cDNA libraries from a variety of tissues and organs were represented by ESTs containing the SelT nucleotide sequence. The data suggest that SelT mRNA is expressed at low levels in a broad range of tissues and organs.
SelR-Nucleotide sequences corresponding to human SelR cDNA were abundant in dbEST and the complete human SelR cDNA sequence was obtained by multiple sequence alignments. Human SelR had a calculated mass of 12.6 kDa. The nucleotide sequence of a mouse EST clone containing a full-length cDNA sequence for SelR was experimentally determined revealing an ORF that was highly homologous to the human SelR (Figs. 3B and 4B). The SECIS element (Figs. 1F and 3C) was located in the 3Ј-UTR, 142 and 554 nucleotides downstream of the Sec codon in mouse and human sequences, respectively. Sequences homologous to SelR were detected in other genomes. Mammalian SelR contained Sec, while homologs in non-mammalian eukaryotes (animals, plants, and yeast) and in prokaryotic organisms contained Cys in place of Sec (Fig. 4B). We have also a Ten additional nucleotides upstream of the 5Ј branch and 10 nucleotides downstream of the 3Ј branch of predicted Helix I were considered in free energy estimation for Helix I plus the internal loop. This reflects the fact that, in most SECIS elements, Helix I is longer than the 7 nt assigned in the consensus secondary structure, and that Helix I could not be accurately predicted from that consensus. detected the sequences of two mammalian proteins, SelR-c1 and SelR-c2, that were homologous to SelR (Fig. 4B). The ORFs for these proteins were assembled on the basis of multiple sequence alignments of EST sequences. These proteins contained Cys in place of Sec in SelR. Human SelR-c1 cDNA sequence has recently appeared in GenBank (accession number AA038899).
Interestingly, SelR genes had direct homologs in all completely sequenced genomes, including several bacterial and archaeal genomes as well as yeast S. cerevisiae and nematode C. elegans genomes. A direct homolog of the SelR gene was present in a minimal gene set for cellular life. This set of 256 protein-coding genes was initially obtained by direct comparison of the smallest (468 protein genes, SelR homolog is the MG446 gene) known genome, Gram-negative bacterium Mycoplasma genitalium, with that of Gram-positive bacterium Hemophilus influenzae (30). The minimal gene set contained the protein genes that are thought to be necessary and sufficient to sustain cellular life (30). Most of proteins encoded in a "minimal gene set" have previously been characterized, but the function of the SelR homolog is not known.
The incidence of human SelR mRNA expression that was calculated as the number of SelR ESTs per 10,000 ESTs in a particular cDNA library is shown in Table II, B. SelR mRNA exhibits moderate levels of expression and is present in a variety of adult and fetal tissues. Estimation of mRNA expression levels as EST incidence is only semi-quantitative since many cDNA libraries represented in dbEST are normalized.
Genomic sequence for human SelR was obtained by search-ing NR with a human SelR cDNA sequence as template. One genomic clone, AC005363, contained the complete 5-kb SelR genomic DNA sequence that was organized in 4 exons and 3 introns. The 5Ј-UTR and the initiation codon were located in the first exon, while Sec-encoding TGA and the entire 3Ј-UTR were present in the last exon. The genomic clone was derived from human chromosome 16p13.3.

Detection of SelT and SelR as Fusion Proteins with GFP-
To demonstrate the occurrence of SelT and SelR in mammals, we initially expressed the proteins as fusion proteins with GFP. Coding regions of human SelT and mouse SelR cDNAs were cloned in a pEGFP-C3 vector that had a GFP gene upstream of the cloned protein. This allowed us to express the proteins of a higher molecular weight than the predicted naturally occurring SelT and SelR. The expected masses of the GFP-SelT and GFP-SelR fusion proteins were ϳ48.9 and ϳ42.6 kDa, respectively. No selenoproteins of similar masses were previously detected in mammalian cell lines, a feature that helped in easy detection of SelT and SelR fusion proteins by metabolic labeling of cells with 75 Se. Hence, monkey CV-1 cells were transfected with GFP-SelT, GFP-SelR, and control plasmids, incubated in the presence of 75 Se, and 75 Se-labeled proteins were detected by SDS-PAGE gels and PhosphorImager analyses. Distribution of selenoproteins in CV-1 cells transfected with a control plasmid (Fig. 5A, lanes 2 and 3) was similar to other mammalian cell types where 75 Se-labeled proteins of 57 kDa (thioredoxin reductase) and 25 kDa (glutathione peroxidase) were among the most abundant selenoproteins (34). The cells transfected with the SelT fusion construct exhibited an additional band at Location of SE-CIS elements in 3Ј-UTR are shown by arrows and nucleotides that represent the primary sequence SECIS element consensus are shown in bold. C, alignment of the predicted SECIS elements of the mRNAs encoding SelTs (human, rat, and S. mansoni sequences) and SelRs (human and mouse sequences). Accession numbers for ESTs corresponding to rat SelT, S. mansoni SelT, and human SelR SECIS elements sequences are AI231051, AI395-351, and AA896979, respectively.

FIG. 4. Multiple alignments of mammalian SelT and SelR with their homologs.
A, multiple alignment of human and mouse SelTs with homologs from other species. The mouse SelT sequence was assembled from 8 and D. melanogaster sequence from 12 independent ESTs. The accession number for C. elegans putative protein is CAB01692. S. mansoni sequence was obtained by conceptual translation of a single EST (accession number AI067883). Zebrafish sequence was assembled from 3 ESTs (accession numbers AI497309, AI477145, and AI617064). B, multiple alignment of human and mouse SelR with eukaryotic, bacterial and achaeal homologs. GenBank accession numbers for M. pneumoniae, E. coli, M. thermoautothrophicum, S. cerevisiae, A. thaliana, and C. elegans SelR homologs, and for human SelR-c1 protein are P75129, P39903, AAB85216, P25566, CAA17151, P34436, and AA038899, respectively. Human SelR-c2 sequence was assembled from five independent ESTs.
ϳ49 kDa (Fig. 5A, lane 1), and the cells transfected with the SelR fusion construct exhibited the band at ϳ42 kDa (Fig. 5A,  lanes 4 and 5), in complete agreement with expected masses for fusion proteins. These data established the expression of mammalian SelT and SelR, the presence of selenium in the proteins and the presence of functional SECIS elements in the genes for new selenoproteins.
Immunoblot Detection of SelT-Rabbit polyclonal antibodies were developed against the synthetic peptide corresponding to the C-terminal portion of SelT. These antibodies specifically recognized selenoprotein in CV-1 cells (Fig. 5B, lane 2). The antibodies were also sufficient to detect the GFP-SelT fusion protein (Fig. 5B, lane 1). Since antibodies to SelT recognized the protein region encoded by gene sequences located downstream of UGA, the readthrough of this codon must have occurred, which is consistent with UGA decoded as Sec. Overall, the immunoblot detection of SelT provided additional evidence for the presence of this selenoprotein in mammalian cells. DISCUSSION We described herein identification of two new Sec-containing proteins through the algorithm that searches nucleotide sequence data bases for mammalian SECIS elements. Fourteen mammalian selenoproteins that were previously characterized had distinct functions and lacked a common amino acid consensus sequence. Therefore, the standard computation biology tools, such as BLAST or FASTA, that are based on the homology/consensus searches in protein and nucleotide sequences, could not be applied for identification of new selenoproteins that do not belong to one of the previously characterized subfamilies of selenoproteins. However, the 3Ј-UTR regions in selenoprotein mRNAs contained a common cis-acting sequence, SECIS element, that is necessary for Sec incorporation (18,19). Analyses of previously identified SECIS elements revealed only several conserved nucleotides, but these were located in the conserved positions within the mRNA stem-loop structure (Fig.  1, A-D) (20 -23). We therefore developed a computer program, SECISearch, that recognized previously characterized as well as new SECIS elements in nucleotide data bases.
The SECIS element consensus sequence, ATGAN/11-12 nucleotides/AA/18 -27 nucleotides/GA, was used in the first step of SECISearch to reduce the number of sequences for future analyses. The second step involved the search for the SECIS element secondary structure that is composed of Helixes I and II, and the internal and apical loops (Fig. 1). However, the SECIS elements in known selenoprotein genes often contain certain "imperfections," such as bulges, mismatches, and GU/UG base pairing within helixes that complicate the description of the SECIS element secondary structure consensus. Such imperfections should be included in the algorithm for accurate description of the SECIS element, but they significantly decrease the specificity of the search criteria. This may result in an overwhelming number of false positives that contain multiple imperfections and that are unlikely to be functional SECIS elements. Experimental testing of these sequences would require enormous effort that is not justifiable.
To avoid this problem, an additional step, the calculation of the minimum free energy of mRNA secondary structure, was incorporated into SECISearch. Free energies of Helix I plus internal loop and Helix II plus apical loop were determined for SECIS elements present in known selenoprotein genes, and the free energy cut-off parameters were established that allowed a great reduction in the number of pseudo-SECIS elements, while keeping the option for imperfections. Free energies were evaluated separately for Helix I plus internal loop and for Helix II plus apical loop rather than for the entire structure because the SECIS core feature, Quartet, involves non-Watson-Crick base pair interactions that could not be described by the available software. Separate free energy calculations for the upper and lower portions of SECIS element provided two inde-  SelT (in A) and SelR (in B) gene expression Incidence of human SelT and SelR gene expression is based on the occurrence of three or more independent cDNA clones in a particular dbEST library. In addition, EST cDNA clones with the incidence of one or two clones per dbEST library were found in libraries representing pineal gland, testis tumor, prostate, eye, heart, fetal liver spleen, white blood cells, bone, uterus, and endothelial cells (for SelT), and parathyroid tumor, spleen, brain, endothelial cells, heart, lung, testis, fibroblasts, pancreas tumor, retina, aorta, and gall bladder (for SelR).  pendent parameters that increased the selectivity of SE-CISearch analyses.
Two versions of SECISearch, the data base search version and the on-line version, used somewhat different parameters and were designed for different purposes. The former may be used to search large nucleotide sequence data bases for a smaller number of sequences containing SECIS elements that are most similar to the descriptor SECIS element. It was designed to minimize the probability of finding pseudo-SECISes. The latter is best suited for testing a query nucleotide sequence for the presence of a potential SECIS element and was designed to minimize the probability of missing the actual SECIS element.
Analyses of the sequences selected by SECISearch from dbEST revealed a high selectivity of this program for SECIS elements as evidenced by the large proportion of SECIS elements from known selenoprotein genes (Table I). On the other hand, pseudo-SECISes constituted a significant proportion of the SECISearch selection, likely due to a low degree of structure and sequence conservation observed in SECIS elements and, perhaps, due to incomplete understanding of the SECIS element structure and other features of selenoprotein genes.
To aid in recognition of SECIS elements in the set of SE-CISearch-selected EST sequences, we used a set of criteria, designated as MSGS, 2 that describes the common features found in selenoprotein nucleotide and amino acid sequences: 1) conservation of Sec and Sec-flanking areas; 2) conservation of SECIS element in the 3Ј-UTR; and 3) the presence of homologous protein sequences that contain Cys in place of Sec (or the presence of distinct homologs in which Sec is conserved). 2 Although MSGS increased the efficiency of searches, a number of the SECISearch-selected sequences could not be tested by this approach because of the lack of homologous sequences available in the data bases. This resulted in difficulties in estimating the number of false positives. Further analyses will be required to assign these sequences as true or false SECIS elements. We hope that the accumulation of nucleotide sequences in dbEST and NR as well as further adjustments in the SECISearch algorithm will result in detection of new selenoprotein sequences in the SECISearch-selected sequences. In the present searches of SECIS elements, two unique nucleotide sequences found by SECISearch satisfied all MSGS criteria. These were experimentally verified and found to encode selenoproteins.
Nucleotide sequences for two new selenoproteins, SelT and SelR, were represented by a number of ESTs in the human dbEST (Table II) and these new selenoproteins had numerous homologous sequences in lower eukaryotes. SelT and SelR did not have homology to previously characterized proteins, but had all the features that established them as selenoproteins; i.e. their nucleotide sequences contained conserved inframe UGA codons in the coding region and conserved SECIS elements in the 3Ј-UTRs. In addition, several non-mammalian organisms contained sequences that were homologous to SelT and SelR genes and contained a cysteine codon in place of UGA, the feature characteristic of mammalian selenoproteins. This feature may provide, in the future, a basis for the alternative approach to search for selenoprotein genes.
Experimental evidence for the presence of SelT and SelR in mammals included expression of selenoproteins in a monkey CV-1 cell line as fusion proteins with GFP and detection of selenium in the expressed proteins by metabolic labeling of the proteins with 75 Se. Immunoblot detection of SelT confirmed the occurrence of this selenoprotein in mammalian cells. Additional evidence for the presence of Sec in SelT is the selenium-dependent regulation of SelT biosynthesis. Selenium is often a limiting factor in selenoprotein synthesis, and selenium supplementation results in elevation of selenoprotein levels in mammalian tissues and cell cultures (29). Using immunoblot analyses, we recently found that the addition of selenium to a mammalian cell culture results in increased expression of SelT. 3 An interesting feature observed for SelR was the presence of its homologs encoded in all genomes for which the complete nucleotide sequence is available. This suggests that the SelR homologs are present in all living organisms. Although the function of SelR is not known, the presence of a SelR homolog in a minimal gene set for cellular life suggests the importance of SelR for one of the basic processes in cellular metabolism. It should be noted that disruption of the mouse Sec-tRNA gene resulted in early embryonic lethality (28), and this effect is likely due to a null expression of one or more selenoproteins. Further gene knockout studies may be necessary to determine if SelR is one of such selenoproteins essential for development. Two Cys-containing homologs of SelR (Fig. 4B) were also detected in mammals and it is not known if SelR deficiency may be compensated for in mammalian cells.
In conclusion, we describe a method for identifying new selenoprotein genes in nucleotide data bases. These genes could be found with the help of a computer program, SE-CISearch, that recognized SECIS elements in selenoprotein genes. We applied this program to search dbEST and identified two new selenoprotein genes with no homology to known proteins. In addition, we provided experimental evidence for the natural occurrence of these proteins. These data demonstrate the applicability of identification of selenoprotein genes through recognition of SECIS elements, and we suggest that this new method will be useful for identification of selenoprotein sequences in current and future large-scale sequencing projects. Identification of new selenoproteins may help in explaining many biological effects of selenium.