Protamines, in the Footsteps of Linker Histone Evolution*

It was perhaps a lucky coincidence that the early attempts to establish the chemical composition of the cell nucleus were initially carried out on such diverse biological systems as salmon sperm heads (1) and geese and chicken erythrocytes (2). Examination of sperm and erythrocyte systems, respectively, lead to the protamine and histone concepts (3). We know with certainty that, with the exception of the male gametes, all somatic cells exclusively contain histones. Hence, in metazoans, protamines (4, 5) are confined to the sperm nuclear chromatin, and even among sperm, protamines are not always present. Indeed, a large number of metazoans contain somatic-like histones in their sperm, and some crustaceans (order Decapoda) lack any chromosomal proteins in their sperm (6, 7). Therefore, in contrast to the somatic nucleus, sperm chromatin may have a much more diverse protein composition. It was not until the first attempt of classification of the sperm nuclear basic proteins (SNBPs)2 by David Bloch (6, 8), an effort later on extended by Harold Kasinsky (7), that a clearer picture started to emerge in this regard. More recently, an enormous effort has been carried out in several laboratories, including our own (9–15), to extend this analysis to a large number of representative organisms from the different phylogenetic groups. With a broader perspective now available, SNBP heterogeneity can be restricted to three major groups or types: histone (H), protamine (P), and protamine-like (PL) (16). Histones consist of core histones (histones H2A, H2B, H3, and H4) and linker histones (histone H1 family). The names refer to the structural role of these proteins. Core histones are responsible for constraining DNA wrapped about a histone core to produce a nucleoprotein complex (chromatin subunit) known as a nucleosome core particle. Linker histones bind to the linker DNA regions connecting adjacent nucleosome core particles and assist in the folding of the chromatin fiber (17). Protamines are a highly compositionally and structurally heterogeneous group of proteins (9). They exhibit a high charge density and a prevalence of arginine in their composition (13), a fact that it is most likely related to the higher affinity with which this basic amino acid binds to DNA (18). They lack any secondary structure in solution butmay adopt a folded conformation upon interactionwithDNA. Protamine-like proteins share compositional and structural similarities between histones and protamines (9, 16). Hence they represent a structurally intermediate group that will be discussed more extensively in the following sections of this review. To a certain extent, all three types of SNBP can be considered structurally analogous as all of them produce folded chromatin fibers of 30–50 nm (19) regardless of the particular structure of the individual nucleoprotamine complexes. At the functional level, somatic histones bind to DNA in a highly dynamic way that not only helps in the folding of the genome but also has an important role in the epigenetic regulation of gene expression (20, 21). In contrast, protamine and protamine-like SNBPs of the spermatozoa bind very tightly to the genome to produce a maximal genome compaction, which completely abolishes the epigenetic information of the paternal histones (9) in this terminally differentiated system. This epigenetic silencing can be reverted only with the assistance of highly specialized protamine-removing proteins such as nucleoplasmin during the molecular events involved in nuclear metabolism during early fertilization (22). The analogous structural DNA condensation potential of the three types of SNBPs raises a question as to the extent of structural homology between them. In the next sections we are going to discuss a series of recent papers that suggest that protamines and protamine-like proteins are evolutionarily related to linker histones.


Histones, Protamines, and Protamine-like Proteins
It was perhaps a lucky coincidence that the early attempts to establish the chemical composition of the cell nucleus were initially carried out on such diverse biological systems as salmon sperm heads (1) and geese and chicken erythrocytes (2). Examination of sperm and erythrocyte systems, respectively, lead to the protamine and histone concepts (3). We know with certainty that, with the exception of the male gametes, all somatic cells exclusively contain histones. Hence, in metazoans, protamines (4,5) are confined to the sperm nuclear chromatin, and even among sperm, protamines are not always present. Indeed, a large number of metazoans contain somatic-like histones in their sperm, and some crustaceans (order Decapoda) lack any chromosomal proteins in their sperm (6,7). Therefore, in contrast to the somatic nucleus, sperm chromatin may have a much more diverse protein composition. It was not until the first attempt of classification of the sperm nuclear basic proteins (SNBPs) 2 by David Bloch (6,8), an effort later on extended by Harold Kasinsky (7), that a clearer picture started to emerge in this regard.
More recently, an enormous effort has been carried out in several laboratories, including our own (9 -15), to extend this analysis to a large number of representative organisms from the different phylogenetic groups. With a broader perspective now available, SNBP heterogeneity can be restricted to three major groups or types: histone (H), protamine (P), and protamine-like (PL) (16).
Histones consist of core histones (histones H2A, H2B, H3, and H4) and linker histones (histone H1 family). The names refer to the structural role of these proteins. Core histones are responsible for constraining DNA wrapped about a histone core to produce a nucleoprotein complex (chromatin subunit) known as a nucleosome core particle. Linker histones bind to the linker DNA regions connecting adjacent nucleosome core particles and assist in the folding of the chromatin fiber (17).
Protamines are a highly compositionally and structurally heterogeneous group of proteins (9). They exhibit a high charge density and a prevalence of arginine in their composition (13), a fact that it is most likely related to the higher affinity with which this basic amino acid binds to DNA (18). They lack any secondary structure in solution but may adopt a folded conformation upon interaction with DNA. Protamine-like proteins share compositional and structural similarities between histones and protamines (9,16). Hence they represent a structurally intermediate group that will be discussed more extensively in the following sections of this review.
To a certain extent, all three types of SNBP can be considered structurally analogous as all of them produce folded chromatin fibers of 30 -50 nm (19) regardless of the particular structure of the individual nucleoprotamine complexes. At the functional level, somatic histones bind to DNA in a highly dynamic way that not only helps in the folding of the genome but also has an important role in the epigenetic regulation of gene expression (20,21). In contrast, protamine and protamine-like SNBPs of the spermatozoa bind very tightly to the genome to produce a maximal genome compaction, which completely abolishes the epigenetic information of the paternal histones (9) in this terminally differentiated system. This epigenetic silencing can be reverted only with the assistance of highly specialized protamine-removing proteins such as nucleoplasmin during the molecular events involved in nuclear metabolism during early fertilization (22).
The analogous structural DNA condensation potential of the three types of SNBPs raises a question as to the extent of structural homology between them. In the next sections we are going to discuss a series of recent papers that suggest that protamines and protamine-like proteins are evolutionarily related to linker histones.

Sperm Nuclear Basic Proteins of Mollusks Have a Linker Histone Precursor
The overall extent of structural heterogeneity of the PL-and P-type SNBPs is better visualized when these proteins are compared within closely related phylogenetic groups. One such example can be found in mollusks. An early comparative study of SNBPs within this group (23) revealed the presence, in some instances, of electrophoretically large proteins with lower mobility than histones and a composition rich in both arginine and lysine residues, such as those described in the clam Spisula solidissima. In other instances, such as in the cephalopods Octopus vulgaris and in Eledone cirrhosa, the SNBPs had higher electrophoretic mobility and a composition that ranged from argininerich, such as in the fish and bird protamines, to highly cysteine-rich, such as in mammalian protamines. The proteins of these three organisms have now all been sequenced (24 -26).
An initial structural characterization of the Spisula large SNBP component showed that the protein had a tripartite organization with a globular central core, which was found to bear a strong sequence similarity to the winged helix domain of histone H5 from bird erythrocytes, another terminally differentiated system (27). A SNBP protein with similar characteristics is extensively distributed throughout bivalve mollusks and has been called protamine-like protein I (PL-I) (28,29). The regions flanking this core are unstructured and are rich in both arginine and lysine (26).
PL-I proteins have now been identified not only in mollusks but also in tunicates and in several fish (9, 30 -32) where they represent the major SNBP component. In all instances (see Fig. 1A) they contain an internal folded domain that corresponds to the winged helix motif (33,34), which is characteristic of the linker histones.
Notably, in the cases of mollusks (35) and tunicates (30) the PL-I protein can undergo post-translational cleavage giving rise to a series of smaller PL proteins with increasingly higher arginine composition. In some instances, as in Mytilus (mussel), PL-III appear to have become independent genes. These phenomena have been taken as an indication that all PL proteins and possibly protamines are somehow related to a primitive linker histone precursor (16) probably related to the replication-independent (RI) lineage that gave rise to the highly differentiated histone H5 from the nucleated vertebrate erythrocytes (36,37).
Interestingly, a potential structural relation between histone H5 and protamines has also been described in other invertebrates. Although there is still very little information about the protamines of insects (13), a putative Drosophila protamine-like protein, which shares some extent of similarity to histone H5 and to the cysteine-rich protamines from mammals, has been identified in screens of transcripts expressed in the male germ line (38).

Protamine-like Proteins from Tunicates and the Lysine to Arginine Transition
One of the major conceptual stumbling blocks in trying to explain the transition from linker histones to protamines has been the difficulty in accounting for the evolutionary transition from the highly lysine-rich (25-30 mol %) composition, which is characteristic of histone H1 molecules (39), to the argininerich (Ն30 mol %) composition of protamines. Although all PL proteins exhibit both a lysine-and arginine-rich composition (Arg ϩ Lys ϭ 35-50 mol %), they still have a distinct composition from the predominantly arginine-rich protamines.
An important breakthrough in this direction came from a recent study of the PL proteins from two closely related tunicates: Styela monterreyensis and Ciona intestinalis (30). The former contains an SNBP composition consisting of two PL-I-related proteins P1 and P2 (10,40). Amino acid sequence analysis showed that these two proteins are indeed related, with the faster electrophoretic component corresponding to the C-terminal domain of the larger component (Fig. 1B). Furthermore, the faster PL component (P2) had an arginine-rich composition (58 mol %) (10), and more importantly, it consisted of repeated arginine clusters, which are characteristic of many invertebrate and vertebrate canonical protamines (13). However, this SNBP composition appeared to be quite restricted to the genus Styela, as other tunicate species consisted only of the larger PL precursor (P1) molecule that apparently had not undergone the post-translational cleavage (10).
In silico analysis based on the genome sequence available for the tunicate Ciona intestinalis revealed that the single unprocessed PL-I ,which is present in this species, had an amino acid sequence that was strikingly similar to that of the Styela larger component except for the fact that its C-terminal region was lysine-rich (Fig.  1B). Careful analysis of the genomic nucleotide sequence encoding for Styela and Ciona PL-I show that the transition from lysine to arginine may have occurred as a result of a single frameshift mutation in the C-terminal domain of these molecules. In this way, a transition from a lysine-rich to an arginine-rich PL-I would have occurred very rapidly, most remarkably at the time when the arginine-rich PL-I underwent post-translational cleavage (Fig. 1B). There are, however, conserved residues within the C-terminal region of Styela and Ciona PL-I, which cannot be solely accounted for by a single frameshift mutation. This suggests that although the transition from Lys to Arg via frameshift would be the most efficient, other processes might have been involved. Work is currently in progress in our laboratory to identify these mechanisms.
This rapid mechanism of evolution is in good agreement with the notion that the reproductive traits (including reproductive proteins) have evolved very quickly (41) and with the experimental evidence that indicates that, despite their rather simple amino acid composition and their high arginine contents, protamines are excellent molecular markers for evolution studies (13).

Evolution of Histone H1 and Evolution of Protamines, How Related Are They?
It is interesting to notice that in contrast to core histones, the evolutionary origins of which can now be traced back to archaebacteria (42,43), the origin of  (30) (the secondary structure is highlighted above the alignment). Sequences were aligned using the BIOEDIT (65) and CLUSTAL_X (66) programs with the default parameters. Residues are colored, based on their side chain properties, as: basic (blue), non-polar hydrophobic (purple), acid (red), and polar uncharged (green). Matching residues with the reference sequence (chicken H5) are indicated by dots, whereas gaps are indicated by dashes. Background yellow color shading accounts for conserved residues. Tertiary structure of the globular core of chicken erythrocyte histone H5 was rendered following the coordinates determined by Ref. 34. This structure was subsequently used as a template in order to model the three-dimensional structures for the globular regions of both Spisula and Styela SNBPs using the SWISS-MODEL server (67). The angles of rotation (y-axis) are shown near the larger views to which they are referred. The GenBank TM accession numbers for the referred sequences are as follows: chicken H5, P02259; Spisula PL-I, AY626224; Styela P1, AY332242. B, differentiation of the arginine-rich P1 SNBP from the tunicate Styela as a result of a gene frameshift mutation (30). The mutation occurred at a point (indicated by a blue star) in the C-terminal domain of a lysine-rich precursor of the tunicate C. intestinalis. The corresponding amino acid sequence alignments are shown below. Changes from lysine to arginine residues are highlighted by yellow and red shading. The Ciona P1 sequence was identified from the draft genome sequence from C. intestinalis with the help of a BLAST search using the Styela P1 sequence as a template (30). The red arrow points to the site of post-translational cleavage.
histone H1 appears to have occurred earlier in eubacteria (44) (Fig. 2). Equally interesting is the fact that although archaeal histones contained a histone fold structure characteristic of core histones (45), they lacked the tails flanking this domain found in higher eukaryotes (42). However, linker histones acquired the winged helix folded domain characteristic of higher eukaryotes in a reverse way. In other words, linker histones were initially composed only of a C-terminal region, and the acquisition of the core domain containing the winged helix motif occurred later in their evolution (Fig. 2) (44). Indeed, many protozoans contain a linker histone consisting only of the characteristic APK-rich C-terminal domain, which is critical for the stabilization of the folded chromatin structure (46).
The long term evolution of the histone H1 family was recently shown to be best described by a birth-and-death process (36,37,47), a mechanism based on recurrent gene duplication events under a strong purifying selection. This mechanism has favored the great diversification presented by the members of this family and was further enhanced by the presence of strong functional and structural constraints, which ultimately led the different H1 isoforms to the acquisition of specific functions (Fig. 2).
In addition, H1 evolution has also favored the differentiation of highly specialized isoforms such as histone H5 (57), an H1 replication-independent isoform restricted to terminally differentiated nucleated erythrocytes of birds, which also appears to be present in amphibians (58) and reptiles. The PL-I sperm-specific proteins, which appear at the end of spermiogenesis (yet another terminally differentiation process) in some vertebrate and invertebrate organisms (16), would also belong to this classification. PL-I protein evolution and the possible link to protamine evolution is summarized in Fig. 2, where a hypothetical model involving the loss of the winged fold domain upon transition from lysine to arginine in precursor PL-I proteins is shown. Such a loss could be speculatively attributed to a gene duplication process, which has been common in both protamine evolution (59) and H1 evolution (36). Significantly, the PL genes in bivalve mollusks have been shown to occur in hypervariable restriction fragment length polymorphism (RFLP) regions (60).
The obvious changes in the structural properties of the proteins, derived from the lysine to arginine transition, determined the new mechanisms underlying protamine evolution. It is their high arginine content that allows protamines to tightly condense chromatin in the sperm nucleus. This feature represented a new (and the most important) constraint driving their evolution, which differed to that presented by somatic H1 proteins. In fact, both positive Darwinian (adaptive) selection (61) and an unusual form of purifying selection (62) are the major mechanisms to which protamines are subject in their evolutionary process.
The lysine to arginine transition and gene segregation may have taken place several times in the course of evolution. Whether this has involved a mecha-FIGURE 2. Schematic diagram of the evolutionary stages involved in the differentiation of the linker histone into the H1 subtypes of metazoan species. The multicellular condition arising from the differentiation of the kingdom Metazoa would have constituted a critical constraint in determining the differentiation between somatic and germinal histone H1 types, arising from a recurrent duplication process under strong purifying selection. Further evolutionary steps would have led to the structural and functional differentiation of the somatic subtypes (H1.1-H1.5, H1 0 , H5) in one direction, whereas the evolution of the germinal H1 types would have otherwise been dictated primarily by sex-specific constraints, leading to the differentiation of sperm/testes-specific (HILS1, H1t2, and H1t) and egg-specific (H1M/B4, H1foo) histone subtypes. Further additional changes in the sperm-specific subtypes gave rise to the highly differentiated PL-I and PL-type SNBPs. We hypothesize that a transition from lysines to arginines such as that observed in tunicates (30), in conjunction with a loss of the core region, could have resulted in the differentiation of the arginine-rich protamines of the P type. The lysine to arginine transition established a new constraint under which protamines of the P type are evolving, with a high global arginine content being positively selected for. nism similar to that described in tunicates remains to be established. In this regard, it is interesting to note that the process of post-translational cleavage of PL-I precursors has occurred repeatedly in completely unrelated groups of organisms such as bivalve mollusks and ascidian tunicates. Remarkably, in both instances the next step in the evolution of these two groups, cephalopods (25,63) and cephalochordates (10), has involved the acquisition of an independent protamine gene encoding for a protein with characteristics almost identical to those of the PL-I arginine-rich fragments.
In conclusion, if it can finally be proven that protamines with independent genes are related to linker histones through the process described above, this relationship would have resulted in the closure of an interesting evolutionary cycle in which the C-terminal domain of linker histones would have returned to the initial independent existence of its eubacterial/protozoan ancestry (see Fig. 2).