Protamines from liverwort are produced by post-translational cleavage and C-terminal di-aminopropanelation of several male germ-specific H1 histones

Protamines are small, highly-specialized, arginine-rich, and intrinsically-disordered chromosomal proteins that replace histones during spermiogenesis in many organisms. Previous evidence supports the notion that, in the animal kingdom, these proteins have evolved from a primitive replication-independent histone H1 involved in terminal cell differentiation. Nevertheless, a direct connection between the two families of chromatin proteins is missing. Here, we primarily used electron transfer dissociation MS-based analyses, revealing that the protamines in the sperm of the liverwort Marchantia polymorpha result from post-translational cleavage of three precursor H1 histones. Moreover, we show that the mature protamines are further post-translationally modified by di-aminopropanelation, and previous studies have reported that they condense spermatid chromatin through a process consisting of liquid-phase assembly likely involving spinodal decomposition. Taken together, our results reveal that the interesting evolutionary ancestry of protamines begins with histone H1 in both the animal and plant kingdoms.

Protamines are a group of highly-specialized chromosomal sperm nuclear basic proteins (SNBPs) 3 that are present in many organisms (1)(2)(3)(4). They replace histones during the last stages of male germ terminal differentiation of spermiogenesis (5). They are small (25-100 amino acids), highly arginine-rich proteins (6,7) that belong to the broad family of intrinsically-disordered proteins (8). Despite their disorganized structure in solution, protamines from both chordate and nonchordate organisms are able to adopt secondary structure in the presence of helicogenic buffers (9) and upon binding to DNA in sperm (10). The functional role of these proteins appears to be multifaceted (11), but their ultimate biological significance is not yet properly understood. In addition to tightly compacting the sperm genome and protecting against DNA damage (11,12) during its exit from the body, protamines may also assist in erasing the somatic epigenetic contribution of histones. Protamines are sporadically but nonrandomly distributed (13,14) throughout the animal and plant kingdoms (3). Fish are a good example of this (13,15). Although the species and families of some orders contain only protamines (P-type), sperm from other orders might contain different SNBP types such as histones (H-type) or protamine-like (PL-type) proteins (4,16).
Despite the heterogeneity in SNBP types, the PL-type, in addition to their highly-basic amino acid (Lys/Arg) composition, contains a distinctive winged helix domain (WHD) that is evolutionarily related to the distinctive WHD of somatic linker H1 histones (17). Also, in those organisms that retain the full histone complement in their sperm-like echinoderms or in zebrafish, they either contain an unusually-long sperm-specific histone H1 (18) or elevated levels of histone H1 (19), which in both instances assist in maintaining a highly-compacted chromatin organization.
Therefore, a structural (20) and evolutionary (16) role of the histone H1 family of chromosomal proteins (21) in shaping sperm chromatin organization is not surprising. However, although a connection between protamines (lacking the WHD) and histone H1 has been demonstrated, and organisms possessing a mixture of related PL-type and protamines have been described (22,23), this is the first observation of the exclusive presence of histone H1 ontogenically-related protamines. The results described below conclusively show that in Marchantia the post-translational processing of three highly-specialized H1 histones of the PL type, which are expressed in antheridiophores, gives rise to the protamines that are found in the sperm of this organism.

Initial evidence for the presence of protamines in Marchantia polymorpha sperm
The first evidence of protamines in M. polymorpha comes from a study published in 1978 by Reynolds and Wolfe (24). Fig. 1D of that paper shows an acetic acid urea-PAGE depicting a pattern of four small molecular mass proteins that is almost identical to that shown in Fig. 1a, lane 1. In both instances, the electrophoretic pattern is characterized by the complete absence of histones, and as it was initially concluded:" . . . histones are completely lost and replaced by proteins . . . resembling animal protamines." (see Ref. 24). Coarse electrophoretic fractionation of these proteins revealed a composition enriched in basic amino acids that was also important in reaching this conclusion. In a subsequent paper, the same authors extended their study to a broader spectrum of plants, including algae, bryophytes, and ferns, for which representative species containing protamines were also identified. Fig. 1 shows the electrophoretic pattern observed in the sperm of M. polymorpha SNBPs compared with those of some representative nonchordate and chordate animals. The amino acid sequence of the protamines from these organisms is arginine-rich (Fig. 1b), and in the nonchordate organisms a marine worm, Chaetopterus variopedatus (lane 4) (23), and a tunicate, Styela montereyensis (lane 5) (22), a WHD-containing PL structurally related to chordates exists (P1 in Fig. 1a, lanes 4 and 5). The high electrophoretic mobility of M. polymorpha SNBPs resembles that of the small protamines present in salmon sperm (Fig. 1a, lane 2). A very similar arginine-rich composition is observed in protamines of other nonchordate and chordate organisms which, like in salmon, exclusively contain P-type SNBPs in their sperm (4,7,25).

Marchantia genome and male transcriptome indicate a relation of its sperm protamines to histone H1
In 2014, we used SNBPs isolated from M. polymorpha sperm and Edman degradation N-terminal amino acid sequencing to publish a partial amino acid sequence corresponding to protamine P5 (26) (Fig. 1a, lane 1). This sequence information was more recently used to identify a histone H1 gene of the SNBP PL-type (termed MpPRM) from the analysis of the male gametogenesis transcriptome of this organism which is expressed during spermiogenesis (27). The protein contains the characteristic histone H1 WHD (Fig. 2, a, PL-1, and b) in addition to several argininerich clusters at its C-terminal end.

Marchantia protamines and its post-translational cleavage
The more recent availability of the M. polymorpha genome (28) has allowed us to identify two additional arginine-rich PL proteins also containing a WHD (Fig. 2a, PL-2/PL-3). The phylogenetic relationship of these three PL proteins and their evolutionary connection to other M. polymorpha histone H1s, as well as to animal PLs, is shown in Fig. 2c. The PLs identified in this way are mainly expressed in antheridiophores (the M. polymorpha male sex organ), in contrast to the histone H1 counterparts, which are more similarly expressed in both antheridiophores and sporophytes (Fig. 2d).

Determination of the amino acid sequence of M. polymorpha SNBPs provides evidence for their PL origin
The sequences for the rest of the M. polymorpha SNBPs (P1-P4 in Fig. 1a) were determined using HPLC-coupled MS. The HPLC-fractionated proteins were fragmented using electron transfer dissociation (ETD) and collisionally-activated dissociation (CAD) (Fig. 3, a-e, see also Table 1 and  Table S1) (29 -31). The sequences obtained (see Table 1), primarily using ETD, confirmed the arginine-rich protamine nature of these proteins, and they revealed their structural relation to the three PL proteins that were bioinformatically identified above. These results conclusively show that M. polymorpha SNBPs are the post-translational cleavage products of three independent, histone H1-related, and protamine-like precursors.

M. polymorpha protamines are di-aminopropanelated at their C termini
Interestingly, in multiple ETD MS 2 scans, a c-ion series could be deduced that covered the majority of the protein. However, there was a strong ion at 215.1740 m/z (z ϭ 1) present in multiple species denoted by the star in Fig. 3a. From this ion, a series of amino acids was determined that complemented the c-ion series. Therefore, the ion at 215.1740 m/z must be a z ⅐ ion. Based on the sequence deduced from the c-ion series, the database BLAST search provided the protein from which the observed species originated as well as the final residue in the protein. This ion corresponded to an arginine with an unknown modification of 56.0740 Da greater than an unmodified arginine. This unknown modification was also observed on C-terminal lysines.
A theoretical elemental composition was calculated to assist in the determination of the modification present on the z 1 ϩ⅐ ion. The molecular formula C 9 H 21 N 5 O was determined to be the closest combination based on parts/million error. Interestingly, there was only one oxygen atom present in the theoretical composition. This would place the modification on the C terminus because an unmodified C terminus would

Marchantia protamines and its post-translational cleavage
have two oxygens. After accounting for the arginine side chain (C 4 H 11 N 3 ) and the protein backbone (C 2 HO), the elements C 3 H 9 N 2 remain.
To determine the structure of the modification, the ion at 215.1740 m/z was first formed by ETD and then re-isolated for fragmentation by high-energy collisional dissociation (HCD) to give the MS 3 scan shown in Fig. 3d. Because of the guanidino group present on the arginine side chain, the charge should be stabilized within this group. Therefore, the structure can be determined from additions to the arginine side chain. As shown in Fig. 3d, the ions present confirm the structure shown in Fig.  3e. All identified protamines contained this modification. The modification was absent in other observed proteins from Marchantia (results not shown), indicating that this modification is specific to protamines.

Chromatin organization during M. polymorpha spermiogenesis
Organisms where SNBPs of the protamine type undergo a post-translational processing involving extensive protein cleavage often exhibit a characteristic nuclear organization during this process. A transition from lamello-fibrillar chromatin organization to a complete electron-dense highly-compacted organization in their mature sperm is observed (7). Such a transition has been observed both in internally fertilizing invertebrate (26,32) and vertebrate organisms (33)(34)(35). Fig. 4 shows an electron microscopic image of spermatids in developing M. polymorpha antheridia (36) (the haploid structure producing male gametes). These spermatids have been shown to contain PL-1-type protamines until the late stages of spermiogenesis (27). As can be seen in Fig. 4, chromatin adopts

Marchantia protamines and its post-translational cleavage
a distinct fibrillar organization with dispersed uniform fibers of 24 Ϯ 3-nm thickness. The fibers appear clearly dispersed and further coalesce into a uniform, highly-condensed, and electron-dense opaque nucleus in the mature sperm (37). This type of chromatin organization has also been observed in the liverwort Blasia pusilla, a bryophyte (26). The peculiar transitional chromatin organization before complete condensation observed in these organisms has been proposed to be mediated by the liquid-liquid phase separation mechanism of spinodal decomposition (26). The process is framed on a physicochemical model that involves "kinetic, equilibrium, and structural aspects of a system in route to equilibrium." In it, the less electron-dense nucleoplasm and the chromatin appear to be continuous rather than one disperse and one continuous phase (7).

Protamines in plants
The occurrence of protamines in the sperm of Marchantia may have broader implications for the overall evolution of SNBPs. As pointed out in earlier work, the shift to the convergent presence of SNBPs in the sperm of both plants and animals might have involved selection pressure to reduce sperm nuclear weight and volume for a swimming male gamete (24) in internally fertilizing organisms. Indeed, protamines appear to be present in more primitive lower plants with motile sperm and are much less ubiquitous in the male nuclei of the pollen grain (3). In this regard, it is interesting that while in animals protamines may be present in both internal and external fertilizers, the last type of fertilization represents an SNBP evolutionary bottleneck. Once organisms from a clade have acquired this fertilization mode, there is no return to the histone or PL SNBP types (38,39). Moreover, evolutionarily-driven interspecific changes in the regulation of protamine genes and amino acid sequence in mice and mammals have been shown to confer inter-specific sperm competition (fertility) advantage (40,42,43). Thus, in addition to providing genome compaction/protection against DNA damage and somatic histone epigenetic clearance (6, 11), protamines might have evolved to fine-tune the structural chromatin features of the nucleus (44). These changes optimize the swimming potential of internal fertilizers, not only in the animal kingdom but also in plant internal fertilizers, such as in the case of bryophites, including liverworts, mosses, and hornworts (45).

Protamines and histone H1
Fig. 5 summarizes the relationship between the SNBPs of M. polymorpha (Fig. 1a, lane 1) and their histone H1-related PL precursors. Once more, these results emphasize the evolutionary relatedness of protamines to histone H1, not only in animals (4, 16) but also in plants (Fig. 2c). Although protamines are the only SNBPs found in M. polymorpha mature sperm, their PL histone H1-related precursors are present in the spermatids in the early stages of spermiogenesis (27). Moreover, M. polymorpha protamines present in the final sperm are the product of post-translational cleavage of their precursors (Fig. 5b).
Post-translational cleavage of protamine precursors is quite a general occurrence both in protostome and deuterostome species, which appears to be independent of their histone H1 origin, and it also appears to have been the product of evolutionary convergence (46). For instance, the protamines of cephalopods (Sepia officinalis (cuttlefish) (47)/Loligo opalescens (squid) (46)) and the protamine P2 in mammals (mouse (48) and human (49)) are the products of a gene encoding a protein with an N-terminal leading domain consisting of a mixture of neutral/ polar amino acids and a highly-arginine-rich C-terminal end. The N-terminal leading sequence is gradually removed during the late stages of spermiogenesis following DNA binding (6,48), and hence it is hypothesized to play an important role in proper chromatin deposition and condensation.

Protamine post-translational processing and sperm chromatin condensation
The post-translational modifications (PTMs) undergone by M. polymorpha protamine precursors include cleavage as well as C-terminal di-aminopropanelation (Fig. 5, b and c). Diaminopropane is a product of the oxidation of spermine and spermidine that is catalyzed by the FAD-dependent polyamine oxidases commonly found in monocotyledonous plants (50,51) and that in A. thaliana have been shown to participate in polyamine metabolism with important involvement in plant development (50). Upon activation of the arginine ␣-carboxyl group in the M. polymorpha protamine, it can readily acylate 1,3-diaminopropane to produce the PTM observed here. Of note, oxidation of spermine by diamine oxidase occurs in human seminal plasma and is related to sperm fertility (52,53).
The occurrence of PTMs in mammalian protamines has been well-documented (6), where phosphorylation, acetylation, and methylation of different residues have been described (54). Although the functional role of these PTMs is not well-understood, serine phosphorylation was initially proposed to regulate the proper interaction of the highly positively-charged prota-mines with DNA, therefore ensuring a proper chromatin assembly (55)(56)(57). In those instances where cleavage of a protamine precursor takes place, like in mammalian protamine P2, phosphorylation does not occur until the initiation of the cleavage process (56).
There is not much information about C-terminal PTMs due mainly to the experimental difficulty of their analysis (58), and hence there is not much information about their functional involvement. However, half of the biologically active peptides and peptide hormones are ␣-amidated at their C termini where the neutralization of the negative charge enhances their hydrophobicity and improves their receptor-binding activity (59). In this regard, the role of the carboxyl end di-aminopropanelation, observed for the first time here in chromosomal proteins,  (dark red dots). c, amino acid sequence of the protamines (Fig. 1a, lane 1) identified by MS. d, schematic representation of the intermediate liquid phase chromatin condensation stage observed in spermatids (Fig. 4).

Marchantia protamines and its post-translational cleavage
remains elusive. The possibility exists that as in the case of phosphorylation, it has an important role in the chromatin transitions undergone during spermiogenesis. In particular, it might play a role during the PL to protamine transition as it might neutralize the negative charge of the carboxyl end which, like in the hormone peptides (59), might otherwise interfere with proper protamine deposition onto DNA from a complex heterogeneous mixture of very small arginine-rich proteins (Figs. 1a, lane 1, and 5c).
Protamine precursor trimming and phosphorylation have been involved in chromatin patterning of the developing spermatid nucleus of those organisms undergoing liquid-phase condensation (60) through a process of spinodal decomposition (7,32,61). These events precede the final highly-compacted chromatin structure that is characteristic of mature sperm. The protein processing observed in M. polymorpha appears to play a similar role as its spermatids exhibit this pattern type (Figs. 4 and 5d), in which the intrinsically disordered nature of the proteins (8) involved (histone H1 and protamines) play a critical role in this process (10,(62)(63)(64). This is very similar to the phase separation process recently described for the heterochromatin domain formation by Drosophila HP1a (65) and other chromatin subcompartments (66). Indeed, sperm chromatin is a salient example of fully heterochromatinized nuclei.
Paraphrasing the title of the paper reporting the genome of M. polymorpha (28), the analysis of its SNBP composition and transitions, made possible through the analysis of its genome, provides a powerful evolutionary insight that transcends that of plants and has implications for the evolution of these protamines and their conserved chromatin organization encompassing the plant and animal kingdoms.
LCMS-grade water and LCMS-grade acetonitrile were purchased from VWR Scientific. LCMS-grade 2-propanol and formic acid were purchased from Thermo Fisher Scientific. Acetic acid, vasoactive intestinal peptide, and angiotensin I were purchased from Sigma.

Protein extraction
M. polymorpha sperm from antheridiophores was resuspended in 100 mM Tris-HCl (pH 7.5), 0.5% Triton X-100 containing 10 g/ml tosyl lysine chloromethyl ketone buffer (at a ratio of 20 l/liter antheridiophore) and centrifuged at maximum speed in an Eppendorf microcentrifuge. The pellet obtained in this way was homogenized in 0.4 N HCl (at a ratio of 10 l/1 antheridiophore) and homogenized in a small Dounce with 10 strokes. The HCl extract was precipitated overnight with 6 volumes of acetone at Ϫ20°C, and the next day it was centrifuged at maximum speed in an Eppendorf microcentrifuge. The protein pellet was washed with acetone at room temperature, centrifuged again under the same conditions, and dried in a speedvac. The protein pellet thus obtained was resuspended in water for further use.

Multiple sequence alignments
These were performed using MAFFT version 7.310 (70) in the Jalview2 package (71). The evolutionary history of the winged-helix globular domains (WHD) was inferred by using the maximum likelihood method based on the Le Gascuel model (72) in the software MEGA version 7 (73). The bootstrap consensus tree inferred from 1000 replicates is taken to represent the evolutionary history of the taxa analyzed. The percentage of replicate trees in which the associated taxa clustered together in the bootstrap test (1000 replicates) is shown next to the branches (only bootstrap values higher than 80% are shown). A discrete Gamma distribution was used to model evolutionary rate differences among sites (five categories (ϩG, parameter ϭ 2.3546)). The analysis involved 13 amino acid sequences. All positions with less than 95% site coverage were eliminated. That means fewer than 5% alignment gaps, missing data, and ambiguous bases were allowed at any position. There were a total of 65 positions in the final dataset.

Tertiary structure prediction
The tertiary structures of the globular domains of histone H1 and PL proteins were modeled in the SWISS-MODEL server (74) using the most closely-related structure, automatically searched by the software, as a template in each case. The obtained structures were rendered using the PyMOL Molecular Graphics System version 1.2r1.

Sample preparation
One l of the acetone-precipitated protein-HCl extract suspension was diluted 10-fold with 0.1% acetic acid in water. 300 nl of the dilution, corresponding to ϳ0.2% of the total suspen-sion, was pressure-loaded onto a reversed-phase column containing 10 cm of 3-m diameter, 300 Å PLRP-s packing material (Polymer Laboratories) within a 360 ϫ 75-m fused silica capillary integrated with an electrospray tip (75,76). Additionally, 100 fmol of both vasoactive intestinal peptide and angiotensin I were loaded onto the column as internal standards.

Chromatography and mass spectrometry
An Agilent Technologies (Palo Alto, CA) 1100 Series binary HPLC system coupled to a Thermo Fisher Scientific Orbitrap Fusion Tribrid mass spectrometer (San Jose, CA) operating in standard pressure mode was used to characterize the proteins in the sample (77). Proteins that were unable to be retained on the column due to high hydrophilicity were identified using an isocratic elution at 100% solvent A (0.3% formic acid in water) for 25 min at a flow rate of ϳ50 nl/min (Fig. S1). Following the identification of proteins not retained on the column, the column was washed with solvent A for an additional 25 min at ϳ100 nl/min to remove any additional salts. The remaining proteins were then eluted using a gradient of 0 -10 Ϫ30 -70 -100% solvent B (72% acetonitrile, 18% isopropenyl acetate, 10% water, and 0.3% formic acid) in 0 -10 Ϫ20 -30 -35 min at a flow rate of ϳ100 nl/min (Fig. S2).
Proteins were selected for fragmentation based on a 60,000 resolution Orbitrap MS 1 scan. Using top speed mode with a 3-s cycle time, proteins with a charge state of 4 and higher were isolated by the quadrupole with a width of 1.5 m/z and fragmented by ETD and CAD (29 -31). All MS 2 scans were analyzed in the Orbitrap at 60,000 resolution with an AGC target of 2.0e5 and a maximum injection time of 100 ms. ETD was performed on ions with m/z 300 -700 using calibrated reaction times (78,79) (Table S2). CAD was performed on ions with m/z 400 -1500 at 30% normalized collision energy. Ions were placed on an exclusion list for 30 s after three scans in 30 s. For only the isocratic identification, all scans included an in-source dissociation energy of 40 to break up any adducts that may be present due to the acetone precipitation.

Determination of C-terminal di-aminopropanelation by MS3
A targeted MS3 method was used to determine the structure of the z 1 ϩ⅐ ion present in many species because it did not correspond to a known amino acid (80,81). First, the precursor mass of 439.18 m/z (z ϭ 9) was isolated with the quadrupole at a width of 1.5 m/z, an AGC target of 5.0e5, and a maximum injection time of 300 ms. These ions were then fragmented by ETD using 2.0e5 reagent ions and a reaction time of 12 ms. The fragment ion at 215.174 m/z (z ϭ 1) was then isolated with a width of 1.5 m/z and fragmented by HCD using 30% HCD collisional energy. The resulting fragment ions were analyzed in the Orbitrap at 60,000 resolution and a scan range of 50 -220 m/z.

MS data analysis
The data files were manually inspected for the species present. All fragmentation scans from the major species were manually sequenced de novo by averaging all MS 2 scans under the peak using Qual Browser version 4.0.27.10 (Thermo Fisher Scientific). Sequenced proteins were identified by using the NCBI Protein Blast Nonredundant protein sequences database (as of 03/22/2019) against M. polymorpha) (82,83).

EM
EM was carried out as described previously (36). Briefly, M. polymorpha antheridia were fixed overnight with paraformaldehyde (2%) and glutaraldehyde (2%) in 0.05 M cacodylate buffer (pH 7.4) at 4°C. The samples were next washed three times with cacodylate buffer and post-fixed with osmium tetroxide (2%) in the same buffer for 3 h at 4°C. The samples were dehydrated by rinsing them for 30 min with increasing concentrations (50,70, and 90%) of ethanol at 4°C, and this was repeated four times with 100% ethanol at room temperature. The samples were finally left overnight at room temperature in 100% ethanol. The samples were subsequently embedded twice in propylene oxide (PO) for 30 min and then transferred to a 70:30 mixture of PO and Quetol-651 resin (Nisshin EM Co.) for 1 h. After evaporation of PO, the samples were transferred to 100% resin, and polymerization was allowed to proceed for 48 h at 60°C. Ultra-thin sections were stained with 2% uranyl acetate and lead stain solution (Sigma) visualized on a JEM1400 Plus (Jeol Ltd.) transmission electron microscope.