A New Human Selenium-containing Protein

Selenium which occurs in proteins as the amino acid, selenocysteine, is essential for numerous biological processes and for human health. A prominent75Se-labeled protein detected in human T-cells migrated as a 15-kDa band by SDS-polyacrylamide gel electrophoresis. This protein subunit was purified and subjected to tryptic digestion and peptide sequence analyses. Sequences of tryptic peptides derived from the protein corresponded to a human placental gene sequence containing an open reading frame of 162 residues and a readthrough in-frame TGA codon. Three different peptide sequences of the 15-kDa protein corresponded to a nucleotide sequence located downstream of this codon, suggesting that the T-cell 15-kDa selenoprotein contains a selenocysteine residue encoded by TGA. Post-translational processing of the N-terminal portion of the predicted gene product to give the 15-kDa protein was suggested on the basis of molecular mass, amino acid analysis, and immunoblot assays of the purified protein. The 3′-untranslated region (UTR) of the gene encoding the 15-kDa protein contained a sequence that is very similar to the canonical selenocysteine-inserting sequence element. Computer analysis of transcript map data bases indicated that this gene was located on human chromosome 1. Its coding sequence showed no homology to known protein-encoding genes. The 15-kDa protein gene was expressed as mRNA in a wide range of tissues, with increased levels in the thyroid, parathyroid, and prostate-derived cells as evidenced by searches of partial cDNA sequences in public data bases. Genes corresponding to the 15-kDa selenocysteine-containing protein were found in mice and rats, while the corresponding genes inCaenorhabditis elegans and Brugia malayicontained a cysteine codon in place of TGA. The discovery of a new human selenoprotein provides an additional example of the role of selenium in mammalian systems.

Selenium has been implicated in immunological function and many other biological processes through various nutritional and biochemical studies (1,2). This trace element is a natural component of several prokaryotic and eukaryotic proteins. Although selenium occurs in prokaryotic proteins either as a cofactor or as a selenocysteine residue, mammalian selenoproteins identified thus far contain selenium only in the form of selenocysteine, which is the 21st naturally occurring amino acid in protein. A selenocysteine tRNA that decodes UGA has been found in all life kingdoms, suggesting that the use of UGA as a codon for selenocysteine is widespread in nature (2). The special conserved stem-loop structures in the 3Ј-untranslated regions of mammalian selenoprotein mRNAs are essential for recognition of UGA as a codon for selenocysteine, rather than a codon for termination of translation (3).
Selenocysteine is located at the active center and is directly involved, or at least implicated, in the catalytic reactions catalyzed by glutathione peroxidases, thyroid hormone deiodinases, and selenophosphate synthetase 2. Thioredoxin reductase contains selenocysteine (13) in a novel C-terminal Gly-Cys-Sec-Gly redox motif (14). This center has been implicated in the peroxidase reaction catalyzed by the enzyme (14) and in a redox interaction with the N-terminal redox disulfide (15), although further studies are necessary to prove the suggested essential role of selenocysteine in this protein. Selenoprotein P, a protein of unknown function, is unusual in that it contains ten selenocysteine residues. The function of a selenoprotein W also remains unknown.
In the present study, the isolation of the 15-kDa protein and its gene sequence are described which characterize this protein as a member of a class of mammalian selenoproteins. Although this new human selenoprotein does not have homology to known proteins, genes putatively encoding this protein occur in other mammals as well as in nematodes and plants. Preliminary data on this protein have been presented elsewhere (16).

EXPERIMENTAL PROCEDURES
Materials-[ 75 Se]Selenious acid was obtained from the Research Reactor Facility, University of Missouri (Columbia, MO), ECL systems were from Amersham Pharmacia Biotech, EST clones were from ATCC (number 384717 from human placental cDNA library and number 409024 from human infant brain cDNA library), and other reagents were commercial products of the highest grade available.
Cell Growth and Protein Purification-A human Jurkat T-cell line, JPX9 (17), was grown and labeled with [ 75 Se]selenious acid (2 Ci/ml) as described (14). 75 Se-Labeled JPX9 cells were mixed with unlabeled cells, suspended in 2 volumes of 30 mM Tris-HCl, pH 7.5, 1 mM EDTA, 2 mM dithiothreitol, 1 mM MgCl 2 , 1 mM phenylmethylsulfonyl fluoride, and disrupted by sonication. Disrupted cells were centrifuged; the su-pernatant was applied to a DEAE-Sepharose column, which had been equilibrated with 30 mM Tris-HCl, pH 7.5, 2 mM dithiothreitol, and 1 mM EDTA (buffer A); the column was washed with 2 volumes of buffer A; and proteins were eluted by application of a linear gradient from 0 to 500 mM NaCl in buffer A. Fractions containing 75 Se were analyzed on SDS gels. Fractions containing the 15-kDa selenoprotein that eluted from the DEAE column with 350 mM NaCl were combined, concentrated, adjusted to a concentration of 0.5 M ammonium sulfate in buffer A, and applied to a phenyl-Sepharose column equilibrated in 1 M ammonium sulfate in buffer A; the column was washed by application of a linear gradient from 0.5 to 0 M ammonium sulfate in buffer A; and radioactive fractions corresponding to the 15-kDa protein were eluted by application of a linear gradient from buffer A to water. Radioactive fractions were combined, concentrated, and loaded on a C 18 reversedphase high performance liquid chromatography column that had been equilibrated in 0.05% trifluoroacetic acid, a gradient of 0 to 60% acetonitrile in 0.05% trifluoroacetic acid applied and 75 Se-containing fractions corresponding to the 15-kDa protein eluted at 48% acetonitrile.
Characterization of the 15-kDa Selenoprotein-Fractions containing the 15-kDa selenoprotein from the C 18 column were dried on a Speed-Vac SC110 (Savant), dissolved in SDS-PAGE 1 sample buffer, and analyzed by SDS-PAGE. The molecular mass of the 15-kDa selenoprotein was determined by electrospray and MALDI mass-spectrometry in fractions from the C 18 column. Both mass spectra revealed a single strong signal of the 15-kDa protein. The native molecular mass of the 15-kDa selenoprotein purified on a DEAE-Sepharose column was determined using native PAGE and analytical high performance liquid chromatography gel filtration as described (18). The 15-kDa selenoprotein was detected as 75 Se-labeled fractions from a gel-filtration column and as a 75 Se-labeled band on native PAGE. For determination of sequences of internal peptides, the 15-kDa selenoprotein was separated by SDS-PAGE, transferred onto a polyvinylidene difluoride membrane, and stained with Ponceau S. The stained single 15-kDa band was removed and submitted to Harvard Microchem (Boston, MA), where the 15-kDa selenoprotein was digested with trypsin and the resulting peptides were separated on a reversed phase high performance liquid chromatography column and analyzed by MALDI mass-spectrometry and amino acid sequencing. Amino acid composition of the 15-kDa protein was also determined at Harvard Microchem. These data were carefully analyzed and found to be consistent with other experimental data (mass spectrometry, SDS-PAGE analyses, immunoblot assays).
Immunoblot Detection-Rabbit antibodies were raised against a synthetic polypeptide corresponding to residues 145-162 of the human 15-kDa protein and used for Western blot analyses with ECL systems.
cDNA Sequencing-Plasmids were isolated according to the instructions provided with the plasmid purification kit (Qiagen), the sequencing reaction products were purified on separation columns as described by the manufacturer (Princeton Separations), and the nucleotide sequences of EST clones were determined using a Dye Terminator Cycle Sequencing kit as described by the manufacturer (Perkin-Elmer).
Computer Analyses-Three different peptide sequences from the 15-kDa selenoprotein were analyzed for matches to the data base of expressed sequence tags (dbEST) of partial cDNA sequences (19) using the BLAST (20, 21) and gapped BLAST-2 (22) programs. Multiple alignments of expressed sequence tag (EST) sequences and their translated products were viewed using the MSPcrunch/Blixem system (23). The Blixem alignments also revealed polymorphic sites in the human ESTs that were clearly distinct from sequencing errors.

Detection, Purification, and Characterization of the 15-kDa
Human Selenoprotein-A human T-cell line, JPX9, was grown in the presence of 75 Se, and the extracts of 75 Se-labeled cells were analyzed by SDS-PAGE and PhosphorImager detection of radioactivity on the gels. One of the major 75 Se-labeled proteins that migrated as a 15-kDa band on SDS-PAGE was purified initially on DEAE-Sepharose and phenyl-Sepharose columns and then further on a reversed-phase column as described under "Experimental Procedures." The resulting protein prep-aration was analyzed by SDS-PAGE. A major protein band migrating as a 15-kDa species was detected by staining the gels with Coomassie Blue and by PhosphorImager detection of radioactivity in the gel (Fig. 1). The purity of the 15-kDa protein was estimated to be about 50%. The native 15-kDa selenoprotein migrated as a 240-kDa species on a native gradient PAGE and was eluted as a 200-kDa species from a calibrated gelfiltration column (not shown). It was not determined whether the native 200 -240 kDa protein is composed of multiple identical 15-kDa Se-containing subunits or is a hetero-oligomeric complex. The molecular mass of the 15-kDa selenoprotein subunit in fractions from the C 18 column determined by MALDI mass-spectrometry was 14,830 Da. Electrospray mass-spectrometry of the same preparation revealed a molecular mass of 14,870 Da. The N terminus of the protein was blocked, which prevented determination of the N-terminal sequence. Amino acid analysis revealed the lack of internal methionine and histidine residues, as well as the hydrophobic character of the protein ( Table I).
Sequences of Tryptic Peptides-The sequences of three different tryptic peptides and one overlapping peptide from the 15-kDa protein were determined (Fig. 2). Computer searches of the partial cDNAs in the dbEST using TBLASTN program revealed nucleotide sequences that corresponded to all three peptides in the same ORF.
Characterization of cDNA and Polypeptide Sequences-Several human dbEST sequences corresponded perfectly to peptides from the 15-kDa protein and were used to assemble an open reading frame (Fig. 2). The two cDNA clones containing the longest 5Ј sequences were obtained from IMAGE and sequenced. These clones revealed a continuous nucleotide sequence of 1268 nucleotides, containing a single open reading frame of 162 amino acid residues and a 3Ј-end poly(A) tail. A single ATG codon occurred in a nucleotide context, GCGATGG, that is similar to the Kozak consensus sequence for initiation of translation (24). This initiation ATG codon was followed by a 489-nucleotide open reading frame with an in-frame TAA termination codon. The obtained ORF included an in-frame TGA codon, suggesting the presence of a selenocysteine residue, Sec-93 (2). Three tryptic peptides for which sequences have been determined corresponded to deduced sequences located downstream of the TGA codon, indicating readthrough of the TGA codon rather than termination of translation. Although selenocysteine was not directly identified as a component of the 15-kDa protein, the labeling of the protein with 75 Se, readthrough of the TGA codon, and the location of selenocysteine insertion sequence (SECIS) element in the untranslated area (Fig. 2) suggest the presence of selenocysteine in the protein. The predicted ORF encoded a protein of 17,790.6 Da. The mass of the purified 15-kDa protein was 14,870 Da, and this discrepancy suggested post-translational processing of the protein. Processing of the 15-kDa protein appears to occur at the N-terminal portion of the protein. That is, antiserum was raised to a synthetic peptide that was identical in sequence to the eighteen C-terminal residues of the 15-kDa protein, and thus the antigenic site resides near the C terminus. As shown in Fig. 1, this antiserum recognized 15-kDa protein at different stages of purification. In addition, one of the sequenced tryptic peptides obtained from digests of the 15-kDa protein corresponded to residues 146 -158, which are located near the C terminus according to the predicted gene sequence.
The N-terminal portion of the putative precursor of the 15-kDa protein, as predicted from the gene sequence, had a stretch of hydrophobic amino acid residues, suggesting the presence of a signal peptide. Cleavage of these N-terminal amino acid residues is consistent with the amino acid composition of the protein (Table I) since the processed protein matches more closely the amino acid analysis data obtained for the purified 15-kDa protein than the full size 17-kDa protein. One possible site for post-translational processing is Ser-27, which coincides with the site of an exon-intron junction (not shown), making this residue the evolutionary favorable site for post-translational processing. The unknown nature of the N-terminal modifying group in the purified protein makes it difficult to predict the exact site of post-translational cleavage.
Homologous Mouse, Rat, Brugia malayi, Caenorhabditis elegans, and Rice Gene Sequences-Computer sequence analyses of the 15-kDa protein and its gene sequence revealed no homology to known proteins. However, a number of dbEST sequences from mouse, rat, B. malayi, C. elegans, and rice showed strong homology in TBLASTN searches with the 15-kDa human protein (Fig. 3). Mouse partial cDNA sequences were assembled into a full-length open reading frame, whereas rat, C. elegans, B. malayi, and rice sequences gave only partial coding sequences. Interestingly, although mouse and rat genes encode potential selenocysteine-containing 15-kDa proteins, the genes in C. elegans and B. malayi encode homologous proteins containing cysteine in place of selenocysteine. This is consistent with observations that nematode genes for glutathione peroxidase and thioredoxin reductase encode cysteine analogs of mammalian selenocysteine-containing proteins.
The regions flanking Sec-93 in the human 15-kDa protein had the highest degree of homology among proteins from different organisms, suggesting that the selenocysteine residue is located in a putative active center. In other mammalian sel-  enocysteine-containing proteins in which the function is established, the selenocysteine residue is located at the active center, and it is essential for catalytic activity of the selenoenzyme (1,2). In addition to partial cDNA sequences summarized in Fig. 3, we have detected additional homologous human and mouse cDNAs relatively distantly related to the sequence for the human 15-kDa protein. These encode hypothetical proteins that have in-frame TGA in the position corresponding to the TGA encoding selenocysteine in the 15-kDa selenoprotein gene. The sequences of human and mouse genes for this protein were recently experimentally verified. 2 These observations suggest that the 15-kDa selenoprotein establishes a new class of eukaryotic selenium-containing proteins.
Gene Expression-Approximately 120 partial cDNA sequences in dbEST were found to match the human 15-kDa protein DNA sequence (within experimental error or expected frequencies of natural polymorphism). This sampling represents a sufficient abundance of independent clones to reveal the approximate tissue distribution of expression of this relatively highly expressed gene (expression as mRNA). cDNA libraries from 32 different adult, fetal, or embryonic tissues or organs were represented in this set of sequences. Table II shows the ranked incidence of these clones in tissues and organs for which at least one library has two or more independent 15-kDa protein cDNAs in dbEST.
Clearly, the 15-kDa protein gene exhibits a very broad spectrum of moderate expression in many tissues, and significantly higher levels of mRNA are shown by thyroid, parathyroid tumor, prostate, and pre-cancerous prostate cells. Expression estimates from dbEST library frequencies should be considered to be only semi-quantitative, considering that some libraries are normalized and variable levels of tissue contamination may exist. More quantitative representative estimates are given by the stringent CGAP (Cancer Gene Anatomy Project) libraries (25) prepared from small numbers of laser-microdissected cells, for example the pre-cancerous prostate library CGAP_Pr2 (Ref. 26; Table II). Irrespective of the quantitative uncertainties, this large body of partial cDNA sequence data strongly demonstrates that the 15-kDa protein gene is expressed in a wide range of tissues, with increased levels of mRNA in the thyroid, parathyroid, and prostate-derived cells. Although in this report we describe the human 15-kDa protein gene expression as the expression of mRNA inferred from data base analysis, we detected the mouse analog of this human selenoprotein in immunoblot assays in prostate, heart, kidney, spleen, liver, and other mouse organs, with the highest level observed in prostate, suggesting the expression of both mRNA and the selenoprotein in many tissues and cell lines. 2 Selenocysteine Insertion Sequence Element-Studies of the mechanism of selenocysteine incorporation into several eukaryotic selenoproteins have implicated related stem-loop structures, located in the mRNA 3Ј-UTR, as essential for selenocysteine insertion into proteins at a UGA codon in the 2 V. Gladyshev and D. Hatfield, unpublished data.

TABLE II
Incidence of the human 15-kDa protein gene expression Incidence of the human 15-kDa protein gene expression is based on the occurrence of two or more independent cDNA clones in dbEST libraries. In addition, 17 libraries from other tissues, including 3 distinct embryo libraries, contained only a single 15-kDa protein cDNA clone and are not tabulated here. For some clones, both 5Ј and 3Ј EST sequences are present in dbEST; these count as only a single cDNA in these calculations.  3. Alignment of the human 15-kDa selenoprotein sequence with homologs from mouse, nematodes, and rice. The amino acid sequence of the mouse protein was deduced from the assembly of 39 independent partial cDNA sequences in dbEST. In addition, experimental confirmation of the 5Ј region encoding the mouse N-terminal sequence was made from partial cDNAs obtained from the IMAGE consortium. The C. elegans sequence was assembled from two partial cDNA clones (GenBankTM dbEST accession numbers C10051 and C08344) that are identical for an 81-base-pair region of overlap and encode the apparently complete reading frame shown. The partial amino acid sequence of the homolog from the filarial nematode, B. malayi, was translated from a single partial cDNA (GenBank TM dbEST accession number AA257328). Two rice partial cDNAs (GenBank TM dbEST accession numbers D47693, D47819) covered the translated region shown (in addition, shorter segments of similarity to the human sequence were noted in translations further downstream, but these were in error-prone regions of mismatch between the two ESTs and are not shown). All pairwise alignments were strongly significant, as shown by TBLASTX-2 (Washington University gapped blast, February 1997 release obtained from ftp://blast.wustl.edu/blast/executables). Typical EST pairs gave amino acid gapped E (expect) values (BLOSUM 62 matrix), using the sum statistics of Altschul and Gish (22) as follows (with the highest HSP score appended in parentheses): human/mouse, 2 ϫ 10 Ϫ35 (717); human/C. elegans, 2 ϫ 10 Ϫ20 (252); human/B. malayi, 8 ϫ 10 Ϫ12 (228); C. elegans/B. malayi, 8 ϫ 10 Ϫ21 (257); human/rice (including multiple short matches for scoring purposes), 1 ϫ 10 Ϫ2 (82). coding sequence. The general structural features of this SECIS (selenocysteine insertion sequence) element have been deduced previously (3,27), based on chemical probe experiments and sequence alignments, as summarized in Fig. 4a.
To locate potential SECIS elements in the 15-kDa protein mRNAs, the human and mouse cDNAs were searched for sequences meeting the following constraints (see Fig. 4a): Helix I, at least 4 base pairs; Internal loop, 3-9 nucleotides; Quartet (the non-Watson-Crick base-paired motif), UGAN (following A in Internal loop) . . . NGAN (following the downstream strand of Helix II); Helix II, 9 -15 standard base pairs extending the Quartet; Apical loop, 10 -20 nucleotides starting with AA(A/G). Single base mismatches or bulges were allowed within helices longer than 6 base pairs. Sequences meeting these stringent criteria were found in both the human and mouse 3Ј-UTRs, ending approximately 60 nucleotides upstream of the poly(A) addition signal sequence (Fig. 2). Fig. 4b shows these human and mouse sequences aligned with the canonical SECIS element (3,27) of the human glutathione peroxidase 1 (GPX-1) mRNA 3Ј-UTR. The 15-kDa protein mRNAs exhibit all the features known to be necessary in other eukaryotic selenoprotein mRNAs to promote seleno-cysteine insertion. It was also noted from the alignments of multiple human sequences from dbEST that a probable G/A substitution polymorphism occurred at an Apical loop nucleotide (position 1125, Fig. 4b). Examination of the primary sequencer trace data (openly available on the Washington University server, http://genome.wustl.edu/est/est_general/tra-ce_intro.html) confirmed that authentic G or A bases occur at this position and are not sequencing errors. An additional site of substitution (C/T) polymorphism was observed at the 811 position of the human 15-kDa protein cDNA sequence (not shown).
Chromosomal Localization of the Gene for the 15-kDa Selenoprotein-Computer analyses revealed the UNIGENE cluster of ESTs (28), corresponding to the 15-kDa human selenoprotein, maps to human chromosome 1, at the position 117-123 cM on the human transcript gene map, corresponding approximately to 1p31 (29).
Concluding Remarks-We have detected a new selenoprotein in human T cells designated as the 15-kDa protein. Although the function of this protein is not known, certain selenoproteins have been associated with a chemopreventive role in cancer (2). Interestingly, there are certain correlations be- FIG. 4. SECIS elements. a, general features of eukaryotic SECIS elements used to identify a matching element in the 3Ј-UTRs of the mRNAs encoding human and mouse 15-kDa selenoproteins (see text). The helical segment of four non-Watson-Crick base pairs is labeled Quartet. b, alignment of the predicted SECIS elements of the human and mouse mRNAs encoding the 15-kDa protein with a typical experimentally verified example (human GPX-1). In helical stems, single base bulges or mismatches are shown by gaps in the arrows. A lowercase a residue above the human apical loop sequence indicates a polymorphism at position 1125. tween the occurrence of the newly described 15-kDa protein in cancer tissue and the effect of selenium on cancer that are worthy of discussion. For example, recent studies have shown that supplementation of the diet with selenium in human clinical trials resulted in 63% reduction in prostate cancer and, to a lesser extent, in the reduction of colon and lung cancers (30). Interestingly, the 15-kDa protein is highly expressed in prostate tissue (see Table II) and thus some of the protective properties of selenium in prostate cancer may be mediated through this protein. In addition, our preliminary data suggest that the levels of the 15-kDa protein and its mRNA are decreased in some cancers (e.g. liver of c-myc/TGF␣ transgenic mice 2 ). Studies to determine whether the 15-kDa protein may play a role in delaying the progression of certain cancers including prostate cancer are in progress.