Identification and Integrative Analysis of 28 Novel Genes Specifically Expressed and Developmentally Regulated in Murine Spermatogenic Cells* □ S

Mammalian spermatogenesis is a highly ordered process that occurs in mitotic, meiotic, and postmeiotic phases. The unique mechanisms responsible for this tightly regulated developmental process suggest the presence of an intrinsic genetic program composed of spermatogenic cell-specific genes. In this study, we analyzed the mouse round spermatid UniGene library currently containing 2124 gene-oriented transcript clusters, predicting that 467 of them are testis-specific genes, and systematically identified 28 novel genes with evident testis-specific expression by in silico and in vitro approaches. We analyzed these genes by Northern blot hybridization and cDNA cloning, demonstrat-ing the presence of additional transcript sequences in five genes and multiple transcript isoforms in six genes. Genomic analysis revealed lack of human orthologues for 10 genes, implying a relationship between these genes and male reproduction unique to mouse. We found that all of the novel genes are expressed in developmentally regulated and stage-specific patterns, suggesting that they are primary regulators of male germ cell development. Using computational bioinformatics tools, we found that 20 gene products are potentially involved in various processes

In sexual reproduction, diploid cells divide to form haploid cells, and the haploid cells from two individuals fertilize to form new diploid cells. This process, producing unpredictably dissimilar offspring, requires intricate and elaborate molecular and cellular events such as genetic recombination and the formation of gametes specialized for fertilization. Male germ cell development or spermatogenesis is a tightly regulated developmental process that occurs in successive mitotic, meiotic, and postmeiotic phases (1)(2)(3). The process occurs in the epithelial lining of seminiferous tubules, in testis. Spermatogonial stem cells, located around the outer region next to the basal lamina surrounding the seminiferous tubule, undergo mitosis, and some of them differentiate into later stage spermatogonia that gradually become primary spermatocytes. These cells continue through the first meiotic division to become secondary spermatocytes. Subsequently, the second meiotic division occurs in rapid succession to produce spermatids. These haploid spermatids are then remodeled into spermatozoa by spermiogenesis. The major events in spermiogenesis are chromatin condensation and morphological changes. This tightly regulated process accompanying meiotic progression and the drastic changes in cell morphology suggests the presence of a highly organized network of genes expressed during spermatogenesis. The regulation of gene expression during spermatogenesis occurs at three levels: intrinsic, interactive, and extrinsic (3). The intrinsic program determines which genes are utilized and when the genes are expressed. The interactive process between germ cells and somatic cells is necessary for germ cell proliferation and progression. The extrinsic influences such as steroid and peptide hormones regulate the interactive process. Among these three levels of gene regulation, the intrinsic genetic program involves germ cell-and stage-specific gene expression.
Recent high throughput genomics projects have focused on the identification of cell-and tissue-specific transcriptomes that are expected to uncover fundamental insights into biological processes. Data bases for expressed sequence tags (ESTs) 1 provide important information for discovery of novel genes with tissue-specific expression profiles. One such data base is Uni-Gene, which provides information on gene-oriented clusters and tissue types of gene expression. The ESTs in UniGene are organized into clusters, and each cluster is composed of sequences overlapping with at least one other member of the same cluster but not with members of any other cluster. Thus, each cluster is likely to contain sequences corresponding to a single gene. In silico biology is becoming a rapidly expanding, powerful tool in modern biotechnology. The UniGene data base combined with other computational bioinformatics data bases provides a great deal of information for predicting the tissue specificity of gene expression, genomic nature, and the structure and function of novel gene products.
Comprehensive understanding of spermatogenesis requires identification and functional characterization of unique genes, since this developmental process is regulated by a precisely programmed cell-and stage-specific gene expression. In this study, utilizing one of the mouse spermatogenic cell UniGene libraries, we discovered a number of novel testicular genes. Our various expression analyses suggest that these genes are specifically expressed and developmentally regulated in mouse spermatogenic cells. Further, we predict that proteins encoded by these genes have significant functions in various processes during spermatogenesis or fertilization. The study is unique in the aspects of systematic identification and in depth characterization of a number of novel genes implicated in the intrinsic genetic program, which determines the sequence of events composing male germ cell development.

EXPERIMENTAL PROCEDURES
RT-PCR-Total RNA was isolated from tissues or cells using a Microto-Midi Total RNA Purification System (Invitrogen), and subsequently, cDNA was synthesized by random hexamer and oligo(dT) priming using Omniscript reverse transcriptase (Qiagen). To determine the tissue distribution of novel gene expression, PCR experiments were performed using cDNAs from multiple tissues (skeletal muscle, brain, lung, heart, liver, kidney, testis, and spleen) of male mice. To test whether transcripts are expressed at particular stages of spermatogenesis, prepubertal and adult male mice (age range 8 -90 days) were sacrificed, and total RNA isolated from their testes was used for reverse transcription. To investigate whether the novel genes are expressed in somatic cells of testis, RT-PCR was performed using total RNA isolated from a Sertoli cell line (4) or the germ cell-lacking testes of W/W v mutant mice (5). Gene-specific primers are listed in Table I. PCR was performed for 30 cycles of 94 g for 30 s, 55 g for 30 s, and 72 g for 1 min. The primers for glyceraldehyde-3-phosphate dehydrogenase as a control were forward primer, 5Ј-TGA AGG TCG GAG TCA ACG GAT TTG GT-3Ј and reverse primer, 5Ј-CAT GTG GGC CAT GAG GTC CAC CAC-3Ј.
Northern Blot Analysis-Total RNA was isolated from each tissue using a TRI Reagent (Molecular Research Center, Inc.), heated at 65°C for 5 min, and separated on a 1.2% agarose gel containing 1.8% formaldehyde. The gels were washed extensively in water to remove formaldehyde and transferred to a Hybond-XL membrane (Amersham Biosciences). Each Northern blot included two 10-g RNA samples extracted from the testis and liver of male mice. The blots were prehybridized for 30 min at 68°C in Rapid-hyb buffer (Amersham Biosciences), followed by hybridization for 2 h at 68°C in the presence of cDNA probe. Probes were derived from PCR products amplified with gene-specific primers (Table I) and labeled with [␣-32 P]dCTP (PerkinElmer Life Sciences) using the Prime-It random priming kit (Stratagene). The blots were washed four times in 2ϫ SSC, 0.05% SDS at room temperature for 10 min and twice in 0.1ϫ SSC, 0.1% SDS at 68°C for 10 min. The blots were exposed to Hyperfilm (Amersham Biosciences) with intensifying screens at Ϫ70°C.
Rapid Amplification of cDNA Ends (RACE)-To determine the transcription initiation or termination site of novel genes, 5Ј-or 3Ј-RACE was performed with the SMART TM RACE cDNA Amplification Kit (Clontech) as described by the supplier. Briefly, first strand cDNA synthesis was performed using 1 g of testis poly(A) ϩ RNA, the 5Ј/3Ј cDNA synthesis primer, SMART II TM oligonucleotide, and Power-Script TM reverse transcriptase. This cDNA was then used in a PCR using universal primer mix and gene-specific primers with the following parameters: 5 s at 94°C, 10 s at 68°C, 3 min at 72°C for 30 cycles. The resulting PCR products were resolved on a low melting agarose gel, and the appropriate band was excised and purified for subsequent steps of cloning or sequence determination. RACE products were cloned into pCR2.1 vector (Invitrogen) and sequenced.
In Silico Analysis-The cDNA sequences of the novel genes were subjected to BLAST search in the NCBI Mouse Genome Resources (available on the World Wide Web at www.ncbi.nlm.nih.gov/genome/ seq/MmBlast.html) and in the Wellcome Trust Sanger Institute Mouse Genome Server (available on the World Wide Web at www.ensembl.org/ Mus_musculus/) to investigate exon-intron structures, chromosomal locations, and human synteny locations. Amino acid sequences deduced from the cDNA sequences of the novel genes were analyzed using several computational bioinformatics tools. PROSITE (available on the World Wide Web at us.expasy.org/prosite/), PFAM (available on the World Wide Web at www.sanger.ac.uk/Software/Pfam/search.shtml), and SMART (available on the World Wide Web at smart.embl-heidelberg. de/) were used to predict the presence of various protein patterns and profiles. SignalP and TMHMM were used to analyze and predict the  (6).

RESULTS
The Round Spermatid UniGene Library and In Silico Selection of Novel Gene Candidates-To discover and investigate unique testicular genes expressed in postmeiotic germ cells, we analyzed the McCarrey Eddy round spermatid library (Library 6786), one of the largest mouse spermatogenic cell libraries deposited in the UniGene data base at the NCBI (available on the World Wide Web at www.ncbi.nlm.nih.gov). As of September, 2004 (Mus musculis UniGene Build 141), the round spermatid library consists of 2124 UniGene entries. We classified genes in the library based on the following criteria. (i) Genes previously named or assigned with potential functions were counted as known genes, and unnamed genes with unknown or unassigned function were considered to indicate unknown or novel genes. (ii) If there is only a testicular EST(s) of a given gene or the number of testicular ESTs of a gene is much higher than that of nontesticular ESTs, the gene was classified as a testis-specific gene. This classification of the 2124 gene entries revealed that about half of the entries are known genes, and 121 of the known genes are testis-specific (Table II and Supplementary Table I). We further categorized the relatively well known genes with testis-specific expression (91 named genes assigned with gene ontology), based on gene ontology codes that provide information about cellular compartment, molecular function, and biological process (Fig. 1). The overall gene ontology feature of these genes was similar to that of genomewide genes in mouse (7), except for the prominent expansion of the testis-specific genes in the categories of development and cell organization/biogenesis. This reflects the implication of these genes in unique biological processes of spermatogenesis. On the other hand, the number of the testis-specific genes from the other half (unknown genes) of the entries in the round spermatid library was much larger (346 genes) than that of the known testis-specific genes (Table II and Supplementary Table  II), indicating that the testis-specific genes in the library have been largely unexplored. Taken together, the combination of known and unknown testis-specific genes comprises over 20% of the current round spermatid library UniGene entries.
At the beginning of our study (November 2002), the total number of gene entries composing the earlier version of the round spermatid library was 933. A search of this library for testis-specific genes with unknown or unassigned function resulted in the selection of 157 such genes. We arbitrarily selected about half of them (73 genes; Supplementary Table  III), and these candidates were analyzed in the present study (Table II).
Authenticity of Novel Genes with Testis-specific Expression-To determine whether the candidates selected from the Uni-Gene library are genuine novel genes with testis-specific expression, we performed various expression analyses. An RT-PCR assay showed that 42 of the 73 candidates are abundantly transcribed with expected sizes in mouse testis. By contrast, for TABLE II Classification of genes in the round spermatid library Gene entries from the round spermatid UniGene library were classified into known and unknown genes or testis-specific and non-testisspecific genes. In known genes, genes unnamed but annotated with gene ontology codes in Mouse Genome Informatics (available on the World Wide Web at www. informatics.jax.org/) were termed "assigned." Genes with Ͼ90% testicular ESTs of total ESTs, including three nontesticular tissues at most, were classified as testis-specific genes. About half of the unknown, testis-specific genes selected from the earlier version of the library were analyzed in vitro to determine whether they are authentic genes with abundant and evident testis-specific expression. Some of these candidates are excluded from the currently classified group of unknown, testis-specific genes, since they have been modified in tissue information or retired from the data base during the updates of the library. All of the testis-specific genes (known, unknown, and analyzed) are listed in Supplementary Tables I-III 1. Gene ontology profile of known, testis-specific genes. Of 91 named, testis-specific genes in the round spermatid library, the number of genes previously assigned with gene ontology terms in the Mouse Genome Informatics data base was 64 (Supplementary Table I).
The gene ontology terms of these genes were grouped into a number of categories that belong to the three main classes: cellular component, molecular function, and biological process. the other 31 candidates, no (10 candidates), very low level (14 candidates), or incorrectly sized (7 candidates) PCR products were detected from the testis. Thus, these candidates were eliminated from further analysis. It should be noted that PCR was designed to be similar among the candidates in primer property and product size, and the reaction condition was the same for all of the candidates. Two of the 42 testicular genes were retired from the UniGene data base during the course of our study. Tissue distribution of the remaining 40 genes was investigated by PCR using mouse cDNAs from different tissues. Twenty-eight of the 40 genes were found to be testisspecific. Collectively, the analyses of the 73 potential genes resulted in the identification of the 28 authentic genes with evident expression specific to testis (Tables I and II, and Supplementary Data). Fig. 2 shows results from these analyses of the novel genes. All of the transcripts of the 28 genes were PCR-amplified with correct sizes ( Fig. 2A and Table I) and specifically expressed in the testis (Fig. 2B). Further expression analysis of these genes revealed that none of the genes is transcribed in a Sertoli cell line, 15P-1 (4), and the germ celllacking testes of the W (c-kit) mutant mice (5) (Fig. 2A). Thus, the result suggests germ cell-specific expression of the novel genes in the testis.
Transcript Analysis and Genomic Characterization-To determine the expression levels and transcript sizes of the 28 genes, we performed Northern blot analysis (Fig. 3). All of the blots showed significant amounts of signals in the RNA samples from the testis, but not from the liver used as a negative control, suggesting abundant expression of the novel genes in the testis. The sizes of the testicular transcripts ranged from 0.5 (Mm.48791) to 5 kb (Mm.262714). For 23 genes, transcript sizes determined by the Northern blot analysis were comparable with those estimated from the UniGene data base. Significant differences in transcript size (Ͼ0.5 kb) between the Northern blots and the data base sequences were found in the other five genes (Mm.353417, Mm.72938, Mm.158134, Mm.262714, and Mm. 266854), suggesting the presence of additional transcript sequences in these genes (Fig. 3). To obtain the full-length transcript sequences of these genes, we performed RACE. This resulted in extension of Mm.353417 from 0.422 to 2.150 kb (GenBank TM accession number AY702103) and Mm.266854 from 0.332 to 1.201 kb (GenBank TM accession number AY702102). Thus, the transcript sequences for the 25 genes can be regarded with confidence as full-length cDNAs or sequences containing the majority of entire cDNA sequences (Figs. 3 and 4). From the Northern blot analysis, we also found that six genes (Mm.34841, Mm.148858, Mm.272846, Mm.338094, Mm.353417, and Mm.48791) produce transcripts with more than a single size (Fig.  3). This suggests the presence of multiple transcript isoforms in these genes by alternative splicing.
To characterize the genomic nature of the novel genes, we performed genome data base searches with the transcript sequences. Fig. 4 shows genomic structures, exon organization, and chromosomal locations of the genes. The identity of the last FIG. 2. Testicular expression of the novel genes. A, complementary DNAs from testes of wild-type (WT) mice, testicular somatic (Sertoli) cells, and the germ cell-lacking testes of W/W v mutant mice were amplified by PCR. All of the genes were expressed in the wild-type testis but not in Sertoli cells and the testis deficient in germ cells, suggesting germ cell-specific expression. Bands at 1 kb represent amplified products of glyceraldehyde-3phosphate dehydrogenase (G3PDH). B, tissue distribution of the genes by RT-PCR analysis in various tissues of adult male mice. All of the genes were exclusively expressed in the testis. Estimation of the intensity of bands of glyceraldehyde-3-phosphate dehydrogenase indicates the equivalent amounts of cDNA templates among tissues. M, skeletal muscle; B, brain; Lu, lung; H, heart; Li, liver; K, kidney; T, testis; S, spleen. exon was confirmed by 3Ј-RACE for most of the genes (data not shown). The sizes of the genes vary from 1 to 100 kb. The numbers of exons in the genes are also variable, ranging from single-exon genes to a 20-exon gene. For the three genes of which transcript sequences are not full-length (Mm.72938, Mm.158134, and Mm.262714), gene size and exon number could be larger than the present estimation. The novel genes were found to be widely distributed on mouse chromosomes (Fig. 4). To extend the findings on these mouse genes, we searched the human genome data base for human orthologues. All of the human orthologues for 18 mouse genes were found to be present in the genomic regions of conserved synteny between mice and humans. However, we found that the other 10 mouse genes do not have human orthologues, suggesting the differential expansion of these genes in the mouse genome (see "Discussion").
Developmental Stage-dependent Expression of Novel Genes-To investigate the developmental expression pattern of the novel testis-specific genes during spermatogenesis, we performed RT-PCR using mouse testis obtained at different days after birth. In the first round of spermatogenesis in prepubertal mouse, stem cells proliferate and differentiate gradually to yield the sequence of spermatogonia, spermatocytes, and spermatids (Fig. 5A) (8). Meiosis begins in the mouse at about day 11. By day 15, pachytene spermatocytes account for about one-third of the total cell population in the seminiferous epithelium. Once the first round of meiosis is accomplished by day 21, round spermatids appear in the seminiferous tubules. If a particular gene is expressed in germ cells during spermatogenesis, a transcript for that gene will appear in the testis at a certain post-partum time point corresponding to a specific stage of spermatogenesis. The RT-PCR result showed that all of the novel genes are expressed at least after day 12, indicating that expression of these germ cell-specific genes is devel-opmentally regulated during the meiotic and postmeiotic phases (Fig. 5, B and C). They could be divided into two groups based on the expression patterns. The first group includes 14 genes, of which expression starts during meiotic prophase, days 14 -20 corresponding to pachytene spermatocytes (Fig. 5B). The 14 genes of the second group are expressed in germ cells (spermatids) after meiosis (Fig. 5C). A gene encoding ADAM2 (a disintegrin and metalloprotease 2) and the protamine-2 gene, known as germ cell-specific and developmentally regulated genes crucial for spermatogenesis or fertilization, were used for comparison of expression pattern with the novel genes (9 -11). It is noteworthy that considerable numbers of the genes selected from the round spermatid library are expressed in pachytene spermatocytes.
In Silico Analysis of Protein Characteristics-To gain an insight into the structures and functions of proteins expressed from the novel genes, a protein-coding region in each gene was determined by selecting the longest amino acid sequence terminated before (if there is one present) a polyadenylation signal (Fig. 4), and the deduced amino acid sequences were subjected to protein data base searches. The prediction of the coding regions for most of the genes is considered to be accurate, except for the three genes whose known transcript sequences are significantly smaller in size than the bands observed in the Northern blots (Figs. 3 and 4). Supplementary  Table IV shows hydrophobicity, specific domain/motif/region, and gene ontology of the predicted proteins encoded by the novel genes. Nineteen gene products were predicted to have various domains, motifs, or regions. Exploration of gene ontology that predicts how gene products behave in a cellular context also revealed diverse protein properties in 20 genes. It should be noted that none of the novel genes has been annotated with gene ontology codes in the Mouse Genome Informatics data base, and thus the present information was obtained (Li) were hybridized with cDNA probes of the genes. Agarose gels were stained with ethidium bromide to visualize 28 and 18 S RNAs. Five genes with significant transcript size differences between those observed on Northern blots and predicted from information in the UniGene data base (DB) were subjected to RACE. This resulted in additional new sequences for two genes (Mm.353417 and Mm.266854) with transcript sizes comparable with those on the Northern blots. Transcript sizes from known sequences (UniGene data base and RACE), transcripts with significant differences in size between the Northern blots and cDNA sequences, and transcripts with isoforms are indicated below the blots. by our assignment of potential gene ontology codes to the genes through BLAST searches with sequences and relevant hits in Web-based gene ontology servers (6). We attempted to categorize these 20 genes based on gene ontology information and relate them to putative functions in reproduction (Table III).

FIG. 3. Transcript analysis by Northern blot hybridization and RACE. Total RNAs from testis (T) and liver
Five of the gene products seem to function as transcriptional regulators during spermatogenesis. Three and two of them are expressed during the meiotic and postmeiotic phases, respectively (Fig. 5). Another five gene products are potentially involved in cell-cell interaction or communication during sper-  Fig. 3. The numbers of amino acids (No. aa) corresponding to the predicted coding regions are listed, and those of the genes, whose known transcript sequences are significantly smaller in size than the bands observed in the Northern blots (Fig. 3), are shown in parentheses. Chromosomal locations were determined by searches of the mouse and human genome data bases. matogenesis. Two of them, with the transmembrane region, also could function during fertilization. Four proteins belonging to a separate category appear to play structural roles. It is noteworthy that all of these gene products are expressed postmeiotically (Fig. 5), thus potentially functioning during spermiogenesis. Alternatively, three of them, with actin binding  MII). B and C, gene expression was analyzed by RT-PCR using cDNAs prepared from mouse testis at different days after birth. Genes were divided into two groups based on their expression timing: expression starting in pachytene spermatocytes (Sc-P) and spermatids (Std). The ADAM2 and protemine-2 genes were included as reference genes. G3PDH, glyceraldehyde-3-phosphate dehydrogenase.

TABLE III
Putative functions of novel genes in reproduction Proteins encoded by the novel genes were functionally categorized based on gene ontology (Supplementary Table IV). In each category, the genes were divided into two groups, in which gene expression starts from spermatocyte stage (meiotic phase) or spermatid stage (postmeiotic phase) (Fig.  5). Genes without human orthologues, thus specific to mouse (Fig. 4), are indicated as "m" in parentheses. activity, could be involved in actin remodeling during sperm capacitation and acrosome reaction. Three of the novel genes seem to encode proteins localized to the nucleus and involved in nuclear activity or integrity. A group containing two genes expressed during the meiotic phase (Fig. 5) could function during meiotic division, since they are predicted to be involved in chromosome condensation or microtubule nucleation. A final category contains a single gene product with a potential tyrosine transaminase activity that converts tyrosine to glucose. This protein might be involved in the process of sperm capacitation or motility, known to be enhanced by glucose.

DISCUSSION
In the present study, we identified and characterized 28 novel testis-specific genes by in silico and in vitro approaches, providing comprehensive information about the genes. We initially selected these genes by analyzing the round spermatid UniGene library. UniGene is a large and widely used EST data base containing a large mass of unexplored information about genes, thereby providing a tremendous resource for identifying novel tissue-, cell-, or stage-specific gene transcripts. Previously, a number of studies have identified genes expressed at a specific stage or in a particular cell type during spermatogenesis using differential display, cDNA/oligonucleotide array, or subtractive hybridization (12)(13)(14)(15)(16)(17). These studies provided inclusive information about the expression profile of a large number of germ cell genes with known or unknown function. In comparison, our study is unique in the systematic identification of the uncharacterized genes with testis-and spermatogenic cell-specific expression. Recently, the number of genes specifically expressed in spermatogenic cells was predicted by in silico analysis of the mouse transcriptome (13). Of 62,692 gene entries with a cluster size of at least two sequence entries in the UniGene data base, the estimated number of genes expressed exclusively in meiotic and postmeiotic germ cells is ϳ2,375. Only about 7% of them are genes with known functions. Although the numbers might have been overestimated due to the redundancy of the sequences, this indicates the presence of a large number of uncharacterized, germ cell-specific genes. Similarly, our analysis of the round spermatid UniGene library revealed that 467 of 2124 genes are testisspecific, and the functions of 74% (346 genes) of the testisspecific genes are unknown. The initial number of uncharacterized, testis-specific genes selected from the earlier version of the library is 157, and 73 of them were analyzed in the present study. These 73 genes were narrowed down to 28 authentic genes considered to be germ cell-specific by various expression analyses. The other 45 genes were eliminated from consideration because they were not detected (10 genes), observed at a low level (14 genes), detected with unexpected sizes (seven genes) or non-testis-specific (12 genes) in the PCR assay, or retired from the data base (two genes).
Our study provides extensive information about the 28 novel genes at the transcript and genomic levels. The Northern blot analysis, critical but usually excluded in large scale studies, revealed various characteristics of the genes at the transcript level, such as expression level, size, and the presence of isoform. We found significant differences for five genes between the transcript size expected from the UniGene data base and that determined experimentally. The full-length transcript sequences for two of these genes were identified by cDNA cloning. Further, we newly found that six genes produce transcript of more than one size. The genomic analysis uncovered an intriguing feature, the absence of orthologues for the 10 mouse genes in the human genome. Despite high synteny between the mouse and human genomes, the proportion of mouse genes with a single identifiable orthologue in the human genome is known to be about 80% (7). Thus, the other 20% of the mouse genes lack a strict 1:1 relationship due to differential expansion in at least one of the two genomes. Most genes expanded in the mouse genome were found to be involved in reproduction, olfaction, and immunity. Similarly, a global view of human and mouse proteases revealed that the mouse degradome is more complex, and many of the genes expanded in the mouse genome correspond to proteases involved in reproductive functions (18). Some examples of these are a number of testis-specific or -predominant ADAM genes expressed in postmeiotic germ cells (18,19). The 10 mouse-specific genes identified in our study could be related to rodent-specific aspects of reproductive physiology.
A group of genes whose expression is strongly favored in male germ cells, encoding proteins with essential roles during spermatogenesis, has been named chauvinist genes (1, 2). These genes can be grouped into three categories: (i) homologous genes expressed only in spermatogenic cells but related closely to genes expressed in somatic cells; (ii) unique genes without significant similarity in nucleotide sequence to those expressed in any other cells; and (iii) genes producing transcripts in both somatic and spermatogenic cells but germ cellspecific transcripts due to alternative transcription start site, transcript-splicing sites, or polyadenylation signals. A special feature of chauvinist genes is that expression of all of these genes is developmentally regulated during meiotic and postmeiotic phases. Consistently, all of the novel genes in this study, falling into the category of unique genes, are expressed in developmentally regulated patterns.
The primary regulator of male germ cell development is the intrinsic genetic program involving germ cell-specific and developmentally regulated genes. These genes are believed to be directly responsible for the sequence of spermatogenesis, such as meiotic progression and morphogenesis, leading to the production of spermatozoa. Furthermore, these gene products may play significant roles in fertilization. We predicted the functions of the novel gene products and found that many of them are potentially implicated in various molecular and cellular processes such as transcriptional regulation, cell communication, structural formation, nuclear activity, meiosis, and metabolism during spermatogenesis (meiotic and postmeiotic phases) or fertilization. It is noteworthy that there are only a handful of known, unique gene products with evident or potential roles in each process (20). Some of these gene products include cAMP-response element modulator and activator of cAMP-response element modulator in testis, sterol regulatory element binding protein 2, and transcriptional intermediary factor 1␦ in transcriptional regulation (21)(22)(23)(24)(25); testis-specific protein kinase 1 in signal transduction during spermatogenesis (26); ADAMs in cell-cell interaction during fertilization (10,27,28); testis-specific actin-capping protein, capping protein ␤3, and mitochondria capsule selenoprotein in structural regulation during spermiogenesis or acrosome reaction (29 -32); transition proteins and protamines in regulating the nuclear structure of spermatogenic cells and sperm (11,(33)(34)(35)(36); meiosis expressed gene 1 and heat shock protein 70-2 in meiosis (37)(38)(39); and phosphoglycerate kinase 2, lactate dehydrogenase 3, pyruvate dehydrogenase a2, and cytochrome c in testis in energy metabolism (40 -43). It should be noted that some of these genes with functions in spermatogenesis are not included in the list of known, testis-specific genes from the round spermatid library (Table II and Fig. 1), because they show different tissue distribution between in vitro experiments and in silico prediction, or they are not found in the library due to expression in other germ cell stages.
In conclusion, much of the regulatory mechanisms underlying male germ cell development remain to be uncovered. It is apparent that spermatogenic stage-and cell-specific gene expression is crucial for the developmental changes. Our in silico analysis indicates that the significant proportion (22%) of the genes expressed in round spermatids are testis-specific, and a majority (74%) of the testis-specific genes have been unexplored. We discovered and characterized 28 novel, unique genes by the systematic and integrative approaches, providing new and comprehensive information such as the presence of additional transcript sequences (five genes), full-length cDNA sequences (two genes), multiple transcript isoforms (six genes), mouse-specific genes (10 genes), and genes with potential functions in male reproduction (20 genes). Our study should be a firm basis for future investigation on functional characterization of the genes, leading to the elucidation of the molecular mechanisms underlying mammalian male reproduction.