The High Molecular Weight Chromatin Proteins of Winter Flounder Sperm Are Related to an Extreme Histone H1 Variant*

Unlike mammals, birds, and most other fishes, winter flounder completes spermatogenesis without replacing its germ cell histones with protamines. Instead, during spermiogenesis, these fish produce a family of high molecular weight (80,000–200,000) basic nuclear proteins (HM rBNPs) that bind to sperm chromatin containing the normal complement of histones. These large, basic proteins are built up of tandem iterations of oligopeptide repeats that contain phosphorylatable DNA-binding motifs. Although the HM rBNPs have no obvious homology to histones, protamines, or other sperm-specific chromatin proteins, we report here the isolation of a clone (2B) from a winter flounder genomic DNA library that establishes a link between the HM rBNPs and histone H1. The 2B sequence contains an open reading frame, which, when conceptually translated, encodes a 265-residue protein. At its N terminus the translation product contains numerous simple repeats that match the oligopeptides contained within the HM rBNPs. Unexpectedly, the C terminus of the putative protein shows 66% identity and 76% conservation to the histone H1 globular domain. This connection suggests that the HM rBNPs may have originated from the extended N-terminal tail region of a testis-specific, H1-like linker histone.

Unlike mammals, birds, and most other fishes, winter flounder completes spermatogenesis without replacing its germ cell histones with protamines. Instead, during spermiogenesis, these fish produce a family of high molecular weight (80,000 -200,000) basic nuclear proteins (HM r BNPs) that bind to sperm chromatin containing the normal complement of histones. These large, basic proteins are built up of tandem iterations of oligopeptide repeats that contain phosphorylatable DNA-binding motifs. Although the HM r BNPs have no obvious homology to histones, protamines, or other sperm-specific chromatin proteins, we report here the isolation of a clone (2B) from a winter flounder genomic DNA library that establishes a link between the HM r BNPs and histone H1. The 2B sequence contains an open reading frame, which, when conceptually translated, encodes a 265-residue protein. At its N terminus the translation product contains numerous simple repeats that match the oligopeptides contained within the HM r BNPs. Unexpectedly, the C terminus of the putative protein shows 66% identity and 76% conservation to the histone H1 globular domain. This connection suggests that the HM r BNPs may have originated from the extended N-terminal tail region of a testis-specific, H1-like linker histone.
In almost all eukaryotic cells, histones have a fundamental role in organizing and condensing DNA (1,2). It is therefore not surprising that the sequences of the core histones (H2A, H2B, H3, and H4) are extremely well conserved and contain many basic residues. The fifth histone (H1), which may or may not sit outside the nucleosomal core (3), is the longest, most variable, and most lysine-rich member of the histone family. The structure of histone H1 can be subdivided into three domains: a variable N-terminal region of 35-40 residues with a net positive charge followed by a well conserved globular domain of 80 residues (4), which is thought to interact with both core histones and nucleosomal DNA, and a very basic C-terminal tail of ϳ90 residues, 90% of which is lysine, alanine, and proline.
Most organisms possess more than one tissue-or stagespecific histone H1 variant (5,6). For example, sperm-specific histone H1 variants (H1T) are commonly found in mammals (6), amphibians (7), and invertebrates (8,9). H1Ts typically have a shorter C-terminal domain and tails that contain a higher proportion of positively charged residues (usually Arg) than their somatic counterparts, as well as a greater number of phosphorylation sites (6). However, the sperm-specific H1 of sea urchin (SpH1) is longer than its somatic counterpart at both ends due to N-and C-terminal extensions composed of tetrapeptide repeats (SPXB, where X is usually basic, and B is K or R) (10).
This trend to increased basicity and a higher arginine content in sperm-specific histones may facilitate condensation of the DNA into the sperm nucleus. In fact, the switch from somatic to sperm chromatin can be accomplished using a variety of proteins. One strategy used by some vertebrates and many invertebrates is to retain histones in a nucleosomal arrangement but to incorporate sperm-specific histone variants and/or other specialized basic proteins into the condensing chromatin (11,12). In mammals, birds, and most fishes, DNA condensation is ultimately accomplished using protamines. These small arginine-rich proteins replace the histones and by doing so eradicate the nucleosomal organization established by the histones.
The winter flounder is one of the minority of bony fishes that retains its histones throughout spermatogenesis and does not replace them with protamines. Moreover, it does not synthesize significant quantities of sperm-specific histone variants (13). The winter flounder does, however, produce a group of high molecular weight basic nuclear proteins (HM r BNPs) 1 in mid-to late spermatids. These unique proteins are retained in the mature sperm, where they comprise Ͼ25% of the total acidsoluble proteins. As judged by SDS-polyacrylamide gel electrophoresis, there are at least 15 HM r BNPs that range in apparent molecular weight from 80,000 to 150,000, with a major band at ϳ110,000 and trace quantities of larger proteins up to 200,000. Amino acid analysis revealed that this group of proteins is constructed primarily from four amino acids: Arg (24%), Ser (23%), Lys (15%), and Pro (14%), which reflects their underlying simple repetitive sequences of dodecapeptides, with the consensus sequence SPMRSRSPSRSK, and heptapeptides, with the sequence RRVXXPK (where XX is QT or PS) (14). This simple composition and intermediate basicity suggests that the HM r BNPs might best fit in the class of chromatin proteins intermediate between histones and protamines (15).
Although the extreme repetitiveness of the HM r BNPs has precluded us from directly sequencing the proteins and from isolating full-length cDNA and genomic clones, we have obtained partial nucleotide sequences including the proximal promoter, 5Ј and 3Ј UTRs and about 1.5 kb of the coding region (manuscript in preparation, see GenBank accession numbers U39735, U39845, and U39932). Using these tools, we have investigated the genomic structure of the HM r BNPs. Here we report the isolation of a winter flounder HM r BNP genomic clone, designated 2B. The ORF of this clone encodes a 30-kDa protein whose N-terminal sequence shows homology to the HM r BNPs. Most interestingly, the C-terminal region of the putative protein encodes a histone H1-like globular core domain. This finding sheds new light on the origin of the HM r B-NPs and their function in the developing sperm of the winter flounder.

EXPERIMENTAL PROCEDURES
Isolation and Purification of Genomic Clones-A partial Sau3A-digested winter flounder genomic DNA library was custom-made by Stratagene in FixII. The amplified library was plated on Escherichia coli NM522, and plaques were transferred to nitrocellulose filters (Schleicher & Schuell). The membranes were screened with 32 P-labeled, HM r BNP cDNA clone 3Ј-5 (Genbank accession number U39735). Selected positive plaques were purified through two further rounds of screening.
Restriction Enzyme and Southern Hybridization Mapping of Genomic Clones-DNA from phage that had been banded twice on CsCl was digested with restriction enzymes and electrophoresed through a 0.8% agarose gel. The DNA was stained with ethidium bromide and photographed. A restriction enzyme map was constructed based on single and double digests. To identify regions of homology to the HM r BNP cDNA, the restriction enzyme fragments were also analyzed by Southern blotting using 32 P-labeled, HM r BNP cDNA clone 3Ј-5 (Genbank accession number U39735). The membrane was hybridized overnight at 68°C in 25 mM sodium phosphate (pH 7.2) containing 7% SDS. After hybridization, the membrane was washed twice for 20 min at 68°C with 0.1% SDS containing 0.5ϫ SSC and then autoradiographed (XAR, Kodak). Regions of hybridization were subcloned into pBluescript (Stratagene) and sequenced (Sequenase 2.0, U. S. Biochemical Corp.).
Isolation and Southern Blot Analysis of Genomic DNA-Genomic DNA from an individual fish was isolated from frozen tissue as described (16). Genomic DNA (10 g) and purified phage DNA (100 ng combined with 10 g of calf thymus DNA) were digested with restriction enzymes, and the fragments were separated by agarose gel electrophoresis and transferred to a nylon membrane (Zeta-Probe GT, Bio-Rad). HM r BNP cDNA clone 3Ј-5 (Genbank accession number U39735) was radiolabeled and used to probe the bound DNA (see above).
Comparison of Genomic Clone 2B with HM r BNP Sequence-Dot matrix comparisons were performed using the Caltech DNA sequence analysis program, version 2.4. The sequences (2B from this paper and HM r BNP compiled from Genbank accession numbers U39845 and U39932) were compared using a stringency of 16 matches in a window of 20 bp.

RESULTS
Isolation of Genomic Clones-The HM r BNPs are a family of abundant proteins produced only in the testis during spermatogenesis. Their abundance was reflected in the frequency with which HM r BNP clones were isolated from a mid-spermatid stage testis cDNA library (17). However, numerous attempts to recover a full-length cDNA have been hampered by the extreme repetitiveness of the sequence. As a result, we have had to piece together information about these proteins and their sequence using a number of different strategies. In an attempt to obtain information about HM r BNP gene structure and regulation, we screened a partial Sau3A-digested winter flounder genomic DNA library using an incomplete cDNA clone. Approximately 200,000 plaque forming units (Ͼ3 genome equivalents) from the amplified library were screened using radiolabeled HM r BNP cDNA clone 3Ј-5. Over 50 hybridization signals of varying intensities were detected. 23 of the phage producing strong signals were plaque-purified, and restriction enzyme maps for 10 of these clones were constructed.
The clones were grouped into two main classes ( Fig. 1) based on the similarity and overlap of their restriction maps. Phage in class A formed a group of five overlapping clones (2B, 2D, 2E, 4B, and 1D) that spanned more than 20 kb. Each clone possessed a discrete region that hybridized strongly to the HM r BNP cDNA probe. With the exception of clone 4B (which was truncated in the region of hybridization), the match to the HM r BNP cDNA probe was restricted to a 1.6-kb SacI/HindIII fragment. Class B comprised three clones (3B, 3C, and 3F) representing ϳ18 kb of genomic DNA. These phage also contained a single region of hybridization that was localized around a common SacI site. The remaining two clones (1C and 2A) appear to have unique restriction maps and showed hybridization only at one end of the phage insert.
Based on SDS-polyacrylamide gel electrophoresis experiments, the HM r BNPs range in size from 80 to 200 kDa, with the most abundant protein being approximately 110 kDa (13). Such size estimates predict a mRNA of at least 2 kb and suggest that the isolated genomic clones would probably not encode a full-length HM r BNP sequence. However, because these clones could possibly represent HM r BNP exons, pseudogenes, or related sequences and might therefore provide important information about these proteins and their genes, we chose to investigate one of the clones (2B) from class A.
The Region of Clone 2B That Hybridized to HM r BNP cDNA Contains a Putative Gene-The 1.6-kb HindIII/SacI fragment (which hybridized to the HM r BNP cDNA) of genomic clone 2B and its flanking regions were subcloned and sequenced. The nucleotide sequence obtained contains a long ORF from bp 498 -1324 (Fig. 2), which when conceptually translated (beginning at the first in-frame methionine codon) would encode a 265-amino acid-long, 30-kDa protein. This ORF is followed by two potential polyadenylation signals (AATAAA), 107 (bp 1431) and 225 bp (bp 1549) downstream from the predicted translation termination codon. The sequence reported here also extends over 500 bp on the 5Ј-side of the coding region.
The 2B Genomic Sequence Is a HM r BNP Homolog-Portions of the nucleotide sequence of 2B showed a high degree of identity to a representative HM r BNP gene sequence. Dot matrix analysis of the two sequences (Fig. 3) using a stringency of Ͼ80% identity (16 out of 20) showed extensive areas of homology. These regions included portions of the proximal promoter, the 5Ј-UTR, and the coding sequence. Curiously, the putative CCAAT and TATA boxes identified in the proximal promoter region of the HM r BNP sequence were not conserved in 2B. DNA coding for the predicted N-terminal region of 2B showed extensive identity to the 5Ј-UTR and coding region of the HM r BNP gene. Although the region in 2B (bp 557-626) matching the HM r BNP 5Ј-UTR has been translated in Fig. 2, it is not known at this time if this sequence is 5Ј-UTR or coding se-quence because it is only after the second in-frame ATG in 2B (bp 596 -598) that HM r BNP-like sequences begin (see below). Dot matrix analysis comparing the coding regions of 2B and HM r BNP showed a high degree of identity and tandem repetition between these two sequences as indicated by the multiple lines parallel to the diagonal (Fig. 3). These regions of identity occurred over discontinuous stretches of about 100 bp in length, suggesting a closely related but distinct protein sequence. The similarity to the HM r BNP sequence ended abruptly around nucleotide 1075 of clone 2B (codon 184).
Genomic Clone 2B Encodes a HM r BNP/Histone H1 Hybrid-The predicted amino acid sequence of the N-terminal region of 2B (residues 14 -188) had an amino acid composition similar to that of the HM r BNPs. The same four amino acids Arg (21%), Ser (22%), Lys (17%), and Pro (11%) were by far the most abundant and together made up 71% of the composition. In addition, 2B contained numerous sequences (Figs. 2 and 4) similar to the heptapeptide and dodecapeptide repeats obtained by endoproteinase Lys-C digestion of the HM r BNPs (14). Interestingly, these repeats tend to occur in a defined order, interspersed with the sequences SPK and MRAKSPRRSK, such that they form a 32-amino acid sequence (Fig. 4). These The region of clone 2B encompassing the area of hybridization to HM r BNP cDNA was subcloned and sequenced on both strands. The long ORF present has been translated (beginning at the first in-frame ATG), and the predicted amino acid sequence is given underneath. Base and residue numbers are shown on the right and left, respectively. The translation stop site is indicated with an asterisk. Two polyadenylation signals are shown (AATAAA). Matches to the HM r BNP heptapeptide and dodecapeptide repeats originally defined by endoproteinase Lys-C digestion (14) are underlined. The histone H1-like globular core region beginning at residue 189 is shaded.
larger repeats, with the consensus sequence KSPMRSRSPSR-SKSPKRRVKTPKMRAKSPRRS, occurred four times (three linked in tandem) within the first 170 residues of the ORF and showed 71-90% identity to each other. Such repeats were also detected in the HM r BNP coding region (GenBank accession number 39735). The three HM r BNP repeats selected for comparison (x, y, and z) showed 80 -90% similarity to the repeats in 2B. In addition, the predicted amino acid sequence of 2B contained four tandemly arrayed 7-amino acid repeats between residues 156 -183 that overlap with the most C-terminal 32amino acid repeat. These repeats (consensus sequence SPKM-RAK) were distinct from both the heptapeptide repeats determined by peptide sequencing (RRVQTPK) and those present in the 32-amino acid repeats (RRVKTPK). They are more like the sequences that separate the dodecapeptide and heptapeptide repeats in the 32-amino acid repeats (SPK and MRAK). Therefore, residues 28 -183 of clone 2B closely resemble the abundant HM r BNPs but with subtle differences in the repeat pattern.
The nucleotide and amino acid match to the HM r BNP sequences ends around amino acid 183 (bp 1075). A data base search with the remaining 82 residues revealed high identity between this region and part of histone H1. In fact, residues 189 -265 at the C terminus of 2B showed an indisputable similarity to the globular core region of histone H1 (Fig. 5). Typically, the core of histone H1 extends about 80 residues (from residues 35-40 to 115) and is highly conserved. By including a two-amino acid gap after residue 251 of 2B (to maximize identity), the overall alignment showed 66% identity and 76% conservation. By comparison to the known structures of the globular region of histones H1 and H5, this two-amino acid deletion occurs in the most flexible part of the structure, the ␤-hairpin between the last two ␤-strands (18). That this region can accommodate such a deletion is nicely illustrated by the structural homolog, catabolite gene activator protein, which has a 4-amino acid deletion at the equivalent point that slightly shortens the ␤-hairpin (4).
Testis-specific histone H1 variants have been found in some organisms, and the cores of three such proteins are present in the alignment shown in Fig. 5. There are a number of "testisspecific" amino acid substitutions in the well conserved core. These include (using the human H1T numbering) K50E, V52L, K56Q, and V61M. Of the three, the last change was seen in 2B (residue 210).
The Isolated Genomic Clones Are Not the Product of in Vitro Recombination-Because the FixII library was propagated in a Recϩ host, it was possible that the clones obtained from this library had been subject to internal recombination/deletion. To address this question, representative clones 1D (the longest of the class A clones) and 3B (a class B clone) were examined by Southern blot hybridization (Fig. 6). The hybridizing fragments from the clones (digested with three different restriction enzymes) all aligned perfectly with bands in the genomic DNA lanes. Clone 1D gave rise to single bands of hybridization at 10, 3, and 10 kb with EcoRI, HindIII, and SacI, respectively, whereas clone 3B produced 6-, 6-, and 5-kb bands of hybridization after digestion with the same three enzymes, respectively. This suggested that the clones were not the result of internal recombination but were continuous genomic fragments. However, as the corresponding genomic bands were far less intense on the autoradiograph than other HM r BNP gene signals, they are presumed to encode minor variants of the HM r BNP gene family or possibly pseudogenes. DISCUSSION The main purpose of spermiogenesis is to produce a streamlined, motile cell that can efficiently transfer the male's genetic FIG. 3. Dot matrix comparison of genomic clone 2B with HM r BNP sequence. The nucleotide sequence obtained for winter flounder genomic clone 2B (x axis) and the 5Ј-end of an HM r BNP gene (y axis) were compared using the Caltech DNA sequence analysis program. The stringency of the alignment was 16 matches in a window of 20. Diagonal lines indicate areas of identity and their extent. Schematic representations of the two genes are given on the respective axes, each divided into 100-bp graduations. The shaded rectangular regions represent the coding sequences of HM r BNP (dark) and H1 (light), respectively. Lines represent flanking noncoding sequences. The putative TATA and CCAAT boxes are represented by T and C, respectively; the transcription start site is marked by a circle; and the putative polyadenylation signals are indicated by stars.

FIG. 4. Genomic clone 2B contains 32-amino acid repeats.
A, the first 200 residues of the ORF of genomic clone 2B. Amino acid sequences corresponding to the 32-and 7-amino acid repeats are shaded and underlined, respectively. The 32-amino acid repeats are identified by numbers, and the 7-amino acid repeats are identified by letters and are aligned to show identity in B and C. The histone H1 core-like region begins at residue 193. B, the consensus sequence (*) for the 32-amino acid repeats is shown with the repeats of clone 2B and the 5Ј-end of the HM r BNP gene given underneath. Amino acids that differ from the consensus are shaded. C, the 7-amino acid repeats are shown with the consensus sequence (*) above. material to the egg. One of the requirements of this streamlining is the condensation of the nucleus. Different organisms have used various approaches to meet this challenge but have generally accomplished it by increasing the positive charges present on the sperm chromatin proteins, either through the introduction of sperm-specific histone variants or the replacement of histones by protamines (12). Among fish, both approaches are used. For example, rainbow trout (19) and yellow perch (20) replace histones with protamines, whereas grass carp retain their histones but produce a number of spermspecific variants (21).
Prior to this study, it was thought that winter flounder used an extreme variation of the second approach. In addition to retaining their histones, they synthesize a novel group of sperm chromatin proteins, the HM r BNPs, that appear to be involved in binding and condensing DNA and are thus functionally related to protamines and linker histones (13,22). However, characterization of the HM r BNPs in isolation failed to identify a link to either group or indeed to any other proteins. Their amino acid composition is intermediate between those of histones and protamines, but their size and amino acid sequence resembles neither one. Recent data base searches of Genbank and Swiss-prot with the HM r BNP peptide repeats and partial nucleotide sequences did not reveal any similar proteins.
The isolation and sequencing of genomic clone 2B has at long last shed light on the origin of the HM r BNPs. The HM r BNPs are clearly related to linker histones because the putative coding region of 2B is homologous to the HM r BNPs at its N terminus and to the globular region of histone H1 at its C terminus. Also, the 5Ј-flanking region of the 2B ORF has very high sequence identity to the proximal promoter of the HM r BNP genes.
It is important to emphasize that this linkage is not a recombination artifact. When the FixII genomic library was screened with HM r BNP cDNA, many hybridization signals were detected. Restriction enzyme analysis of 10 of these clones picked at random did not identify a full-length HM r BNP gene, despite their strong hybridization signal on phage DNA blots (see Fig. 5). Although one of the 10 clones isolated (1C) contained 1.5 kb from the 5Ј-end of a HM r BNP gene, this identity occurred at one end of the phage insert where the bulk of the repeat had been removed by the original Sau3A digestion (data not shown). The failure to isolate a full-length HM r BNP gene is entirely consistent with the extraordinary difficulties experienced in trying to clone these highly repetitive sequences at the cDNA level. However, because the host cells used to screen the genomic library (E. coli NM532) were Recϩ, the possibility that the isolated clones had undergone recombination/deletion was investigated. Southern blot analysis of these clones alongside winter flounder genomic DNA indicates that they contain bona fide DNA fragments that have not been rearranged. It also suggests that gene 2B represents a minor constituent, perhaps a single copy gene, among the HM r BNP multi-gene family. Indeed, it is notable that the intransigence of the HM r BNP genes to cloning facilitated the isolation of clone 2B.
It is quite likely that the progenitor of 2B and the HM r BNPs was a testis-specific variant of histone H1, which would account for the tissue and stage specificity of HM r BNP expres- were digested with the restriction enzymes indicated. Digests were electrophoresed through a 0.8% agarose gel and transferred onto a nylon membrane. The blot was probed with HM r BNP cDNA and then washed twice with 0.5ϫ SSC/0.1% SDS for 20 min at 68°C. X-ray film was exposed to the membrane for 21 h. The representative genomic clones of class A and B and winter flounder genomic DNA are indicated as 1D, 3B, and G, respectively. DNA size markers ( HindIII, Life Technologies, Inc.) are shown on the left. sion. Some testis-specific histones are notably more basic (and arginine-rich) than their somatic counterparts and have N-or C-terminal regions that contain short simple repeats of the SPKK motif (6,10,23,24) that are so abundant in the HM r B-NPs. Amplification of similar repeats could have produced an extreme H1 variant like 2B to assist in regulated and reversible sperm chromatin condensation. Gene duplication, rearrangement, and loss of the globular H1-like domain in the HM r BNP progenitor have apparently given rise to this unique auxiliary sperm chromatin protein. Moreover, the demand for this protein seems to have led to the extensive expansion and amplification of its gene to the point where there are now at the very least 15 HM r BNP isoforms (13).
This recognition of HM r BNP origins is particularly interesting in relation to recent work on the "protamine-like" sperm proteins of the mollusk, Mytilus californianus. Carlos et al. (8,25) have demonstrated that two of the three major protaminelike proteins in this invertebrate are in fact post-translational cleavage products of an H1-like protein. Specifically, PL-IV was the C-terminal peptide of this protein, and PL-II* (or 2B) encompassed the N-terminal peptide linked to an 84-amino acid trypsin-resistant globular region that shows ϳ40% similarity to the globular core of many histone H1s (11). The third major protamine-like protein (PL-III) was homologous to the N-terminal region of PL-II* (26). PL-II*, PL-III, and related proteins are enriched in SPKK motifs and also contain stretches of alternating S(R/K) (27), both of which are prevalent and phosphorylatable in the flounder HM r BNPs (28). In these two instances, nonclassical, protamine-like sperm proteins have turned out to be histone H1 derivatives. Because fishes that do not produce protamines often have additional quantities of linker histones (29), a common theme is beginning to emerge, that of extra linker histones and their derivatives compensating for the lack of protamine in the developing sperm nucleus. Indeed, it is quite likely that some of the additional sperm-specific proteins and other protamine-like proteins in the recent classification of Saperas et al. (29) will turn out to be H1 derivatives.
It is beginning to appear that the distinction among protamines, protamine-like proteins, and linker histones may simply be their stage of evolution. The idea that sperm basic chromatin proteins originated from histones was originally proposed by Subirana et al. (30). This hypothesis has since been refined by Ausio and co-workers to suggest that such proteins arose from a primitive histone H1, and it is supported by their extensive biochemical analysis of sperm nuclear proteins from a wide variety of lower eukaryotes (31)(32)(33). The isolation and characterization of winter flounder clone 2B provides evidence in vertebrates that specialized sperm chromatin proteins have evolved from the N-terminal tail of a progenitor linker histone.