Unicellular ancestry and mechanisms of diversification of Goodpasture antigen–binding protein

The emergence of the basement membrane (BM), a specialized form of extracellular matrix, was essential in the unicellular transition to multicellularity. However, the mechanism is unknown. Goodpasture antigen–binding protein (GPBP), a BM protein, was uniquely poised to play diverse roles in this transition owing to its multiple isoforms (GPBP-1, -2, and -3) with varied intracellular and extracellular functions (ceramide trafficker and protein kinase). We sought to determine the evolutionary origin of GPBP isoforms. Our findings reveal the presence of GPBP in unicellular protists, with GPBP-2 as the most ancient isoform. In vertebrates, GPBP-1 assumed extracellular function that is further enhanced by membrane-bound GPBP-3 in mammalians, whereas GPBP-2 retained intracellular function. Moreover, GPBP-2 possesses a dual intracellular/extracellular function in cnidarians, an early nonbilaterian group. We conclude that GPBP functioning both inside and outside the cell was of fundamental importance for the evolutionary transition to animal multicellularity and tissue evolution.

The emergence of multicellular metazoans from unicellular protists coincided with the appearance of a specialized form of extracellular matrix, the basement membrane (BM) 4 (1)(2)(3)(4)(5). A myriad of studies in bilaterians have revealed a multiplicity of functions of BMs in tissue development, homeostasis, and diseases (6,7). These include compartmentalization and maintenance of tissue architecture, organizing growth factors, signaling gradients, guiding cell migration and adhesion, delineating apical-basal polarity, modulating cell differentiation during development, orchestrating cell behavior in tissue repair after injury, and guiding organ regeneration (4, 8 -13). Recent developmental studies have shifted the view of BM from one of a static support structure to that of a dynamic scaffold that is regularly remodeled to actively shape tissues and direct cell behavior (5, 14 -16).
Despite these advances, there is a major gap in knowledge on the composition and function of BM at the transition from unicellular protists to multicellular animals, a pivotal event in metazoan evolution. The functionality of the BM in bilaterians is conferred by a toolkit of proteins that assemble into a supramolecular scaffold (1,2). These proteins include collagen IV, laminin, nidogen, and agrin/perlecan (17)(18)(19). Recently, additional proteins were identified: peroxidasin (20), lysyl oxidase II (21), and Goodpasture antigen-binding protein (GPBP) (22)(23)(24). Recent studies in nonbilaterian animals have shown that collagen IV and laminin (25) are the earliest components of basement membrane, as evidenced in ctenophores and sponges, the two oldest animal lineages, and were likely essential for animal multicellularity (2,3).
Among BM proteins, GPBP was uniquely poised to also play a role in the appearance of BMs and the transition to multicellularity, owing to its multiple isoforms (23,26,27) with varied functions both inside (28,29) and outside of cells (22)(23)(24)30).
Human GPBP is encoded by the COL4A3BP gene and is transcribed into multiple isoforms (Fig. 1). GPBP-1 mainly functions on the outside of cell as a kinase that phosphorylates collagen IV (23) and plays a role in BM assembly (22,24,30). GPBP-2 (also referred to as GPBP⌬26 or CERT, ceramide transporter protein) mainly functions inside of cells translocating ceramides (26,28). GPBP-3 is membrane-bound and functions to increase the secretion of GPBP-1 into the extracellular matrix (27). GPBP-1 and GPBP-2 are associated with a vast array of biological and pathological processes, including  30). An understanding of the evolution and divergence of GPBP isoforms may shed light on the role they played in the evolutionary transition to multicellular animals.
Importantly, comparison of metazoans to unicellular relatives may shed light on the evolutionary transition to multicellularity in animals (35). Previous phylogenetic studies of GPBP-1 and GPBP-2 were based on genomic data (35)(36)(37) from bilaterian and some nonbilaterian animals. Here, we extended the phylogenetic studies to include analysis of newly available transcriptomic and genomic data from bilaterian and nonbilaterian and unicellular protists. Our findings reveal that GPBP-2 is the most ancient isoform, originating in the last common ancestor of filastereans, choanoflagellates, and metazoans. GPBP-2 having both intraand extracellular functions in early metazoans likely played a role in the evolutionary transition to multicellular animals.

Unicellular origin and evolution of GPBP
We traced the evolution of GPBP by analyzing transcriptomic data across multiple phyla. We used multiple-sequence alignments (MSAs) to characterize the six functional domains of GPBP (Fig. 1). Among these, the serine repeat motif 2 (SR2) domain is a distinguishing feature (26 -28, 31). GPBP-1 and GPBP-3 both contain an SR2 domain and have extracellular related functions, whereas GPBP-2, characterized by the absence of an SR2 domain, has an intracellular function (28). GPBP isoforms containing an SR2 domain were only found in chordates, indicating that GPBP-1 and -3 are absent among invertebrate animals, choanoflagellates, and filastereans ( Fig. 2A and Fig. S1). Isoforms lacking an SR2 domain were identified across all groups, indicating that GPBP-2 is conserved across animals, choanoflagellates, and filastereans ( Fig. 2B and Fig. S2).
The conservation of the five other functional domains is unknown. We used MSAs to assess their phylogenetic distribution and conservation. The PH domain is an N-terminal phosphoinositide recognition domain that binds the Golgi membrane via phosphoinositide 4-monophosphate (23,38). The COF (CERT, OSBP, and FAPP) motif (KWTNYIHGWQ) within the PH domain is essential for Golgi-specific recognition and binding (28,39). MSA analysis reveals that the COF motif is well conserved throughout the animal kingdom. Among the representative animal species in our data set, only lamprey lacked conservation of the COF motif. Conservation of the COF motif extended to GPBP sequences from choanozoan and filasterian organisms (Figs. S1 and S2). The middle region of GPBP contains a FFAT motif, comprising two phenylalanine residues in an acid track (EFFDAXE), which interacts with the endoplasmic reticulum (ER) membrane protein vesicle-associated membrane protein-associated protein. Toward the C-terminal end of the protein, the START domain binds and transports ceramides (40). Multiple sequence alignments show strong conservation of the FFAT motif and START domain ( Fig. 3 and Figs. S1 and S2). Collectively, our findings reveal that GPBP originated in the last common ancestor of metazoans, choanoflagellates, and filastereans, with GPBP-2 being the most ancient isoform (Fig. 3).

Genomic arrangement of COL4A3BP orthologs
Conservation of the collinear arrangement of genes (synteny) is considered a strong indication of functional conservation across species (41). Synteny analysis can be performed on a macro (genomic) or micro (genomic fragment) scale. We used microsynteny analysis to determine the presence or absence of shared synteny around COL4A3BP in diverse and phylogenetically relevant genomes. Human COL4A3BP was used as bait in genome database searches. No orthologs of COL4A3BP were detected in fungi or plant genomes. Orthologs of COL4A3BP were found in filasterean, choanozoan, and metazoan genomes.

The evolution and diversification of GPBP
Our analysis revealed that the genomes of unicellular organisms, invertebrates, and chordates each possess separate and differentiating patterns of gene clustering among genes immediately neighboring COL4A3BP. Analysis of vertebrate genomes revealed shared microsynteny in the genomic region containing COL4A3BP, DNA polymerase (POLK), 3-hydroxy-3-methylglutaryl-CoA reductase (HMGCR), ankyrin repeat domain 31 (ANKRD31), and ankyrin repeat domain 1 (ANKD18) orthologs. Notably, the clustering of COL4A3BP and DNA polymerase (POLK) on the same chromosome is a consistent feature throughout vertebrates (Fig. 4). Previously, we have demonstrated that POLK and COL4A3BP are oriented in a head-to-head fashion and share a bidirectional promoter (42). Our current findings provide the first evidence of an evolutionary link in The evolution and diversification of GPBP their expression. In contrast, we found that unicellular and invertebrate genomes lack conservation of gene arrangement immediately around the COL4A3BP locus (Fig. 4). These findings illustrate a change in gene clustering and genomic arrangement of COL4A3BP at the evolutionary point of GPBP-1 emergence.

Evolution of COL4A3BP gene
Evolutionary changes in gene structure occur through a variety of mechanisms that can directly contribute to proteome diversity (43)(44)(45). We sought to determine the impact of evolutionary changes in COL4A3BP gene structure on the divergence of GPBP isoforms through comparative mapping of intronic regions and analysis of 5Ј-UTR sequences. Our studies revealed that modifications to COL4A3BP gene structure follow the same evolutionary pattern revealed in our microsynteny analysis. Among unicellular protists and invertebrate orthologs, no conservation of COL4A3BP gene structure was observed. Intron sequence analysis uncovered 51 unique intron sequences and 11 intron gain events throughout the evolutionary history of COL4A3BP. Phylogenetic mapping of intronic sequences identified three intron gain events specific to vertebrates, one of which is positioned just before the SR2-encoding exon. A second gnathostome-specific intron was identified just after the SR2 domainencoding exon (exon 11 in COL4A3BP). The conservation of these two introns is a distinguishing feature of COL4A3BP orthologs that encode SR2 domain-containing isoforms (GPBP-1 and GPBP-3). These findings demonstrate the mechanistic role of intron gains in the diversification of vertebrate isoforms of GPBP ( Fig. 5 and Fig. S3).
Structurally, GPBP-3 is distinguished from GPBP-1 in humans by the presence of an 83-amino acid leading sequence that results from the translation of the GPBP-1 mRNA transcript at an upstream alternative non-AUG translation initiation site existing in an ORF expanding 129 residues (27). Comparative analysis of 5Ј-UTR sequences revealed the conservation of multiple non-AUG translation initiation sites, indicating a marsupial origin of GPBP-3 (Fig. 6).

GPBP at the dawn of multicellularity
Cnidaria is a basal phylum that is among the very first metazoans with a true basement membrane, a mesoglea that separates two cell layers. Numerous ECM components are conserved in cnidarians (46), including collagen IV, laminin, peroxidasin, SPARC, and usherin. Hence, Cnidaria, such as Nematostella vectensis (Fig. 7), are well suited to investigate the structure, expression, and localization of GPBP-2 in tissues of a basal metazoan spe-  The schematic illustrates the three distinct patterns of shared synteny that delineate COL4A3BP gene family members. Chordate species have a distinguishing pattern of conserved gene arrangement around COL4A3BP, which consists of a head-to-head orientation of COL4A3BP and POLK and/or a head-to-head orientation of HMGCR and ANKDD1B. Among those Teleost species believed to have undergone 3WR, a specific pattern of shared synteny consisting of a tail-to-tail orientation of COL4A3BP and HMGCR was found, whereas invertebrate orthologs of COL4ABP lack a distinct pattern of gene clustering. (Gene orthologs are represented by arrows of the same color; gray boxes represent nonorthologous genes). ANKDD1B, ankyrin repeat and death domain-containing 1B; ANKRD31, ankyrin repeat domain 31; COL4A3BP, collagen type IV ␣3binding protein; GCNT3, glucosaminyl (N-acetyl)transferase 4, core 2; HMGCR, 3-hydroxy-3-methylglutaryl-CoA reductase; POLK, DNA polymerase . Genes are illustrated in the order in which they appear on chromosomes/scaffolds for each species. Where available, the chromosome/scaffold number is provided. Gene names refer to human orthologs as listed in the Ensemble database.

The evolution and diversification of GPBP
cies. The predicted length of Nematostella GPBP (NvGPBP) based on genomic data is 488 amino acids. In humans, GPBP-2 is composed of 598 amino acids. Our genomic analysis revealed several inversion errors in the GPBP coding region, which, when corrected, yielded a sequence of 615 amino acids. The expression of a single 615-amino-acid-long isoform was confirmed by transcriptome assembly and through mining of publicly available transcriptome libraries. As anticipated, NvGPBP-2 lacks an SR2 domain. NvGPBP-2 shares 65 and 50% amino acid sequence similarity with the pleckstrin homology domain and the START domain of human GPBP-2, respectively (Fig. 7B).
GPBP-2 is localized intracellularly in vertebrates; however, its localization in early metazoans is unknown. The localization of NvGPBP-2 was determined by immunohistochemical analysis using a mAb against human GPBP (N27). These analyses revealed a broad and diffuse distribution of NvGPBP-2 in both intracellular and extracellular compartments throughout the body column and tentacles (Fig. 7C). Fluorescent immunohistochemical studies reveal that NvGPBP-2 and NvCol4 co-localize at the border of the mesoglea (Fig. 7D). Additionally, studies were performed to determine whether recombinant NvGPBP-2 expressed in Sf9 insect cells was secreted or retained intracellularly. The results showed that rNvGPBP-2 was localized in both the cell pellet and the cell media (Fig. 7E), indicating both intracellular and extracellular locations. Moreover, NvGPBP-2 undergoes autophosphorylation, indicating that GPBP-2 has kinase activity, as demonstrated previously for vertebrate GPBP-2 (Fig. 7F). Together, these studies suggest that NvGPBP-2 functions both intracellularly and extracellu-larly and demonstrate the early metazoan origin of GPBP-2 kinase activity.

Discussion
There is increasing evidence that the molecular machinery for regulating development, homeostasis, and immune function was present in unicellular ancestors of animals, well before the appearance of multicellularity (47)(48)(49). The identification of proteins and mechanisms co-opted during the transition to multicellular animals is essential for understanding tissue evolution, development, and disease. Emerging evidence indicates that BM, an extracellular matrix scaffold composed of numerous proteins, played an essential role in this transition. Here, we sought to determine the lineage of GPBP, a BM protein that was uniquely poised to play a role in this transition, owing to its multiple isoforms (GPBP-1, -2, and -3) with varied functions both inside and outside of cells.
Our studies reveal that GPBP-2 was the most ancient isoform, emerging over 900 million years ago in the last common ancestor of filastereans, choanoflagellates, and metazoans (Fig.  S1). In vertebrates, GPBP-2, also known as CERT, mediates a nonvesicular mechanism of ceramide transport from the ER to the Golgi apparatus (28). Presumably, this intracellular CERT function is operative in the Cnidarian species and unicellular protists. Moreover, GPBP-2 may possess a second intracellular function owing to its kinase activity described for vertebrates (26) and for the Cnidarian N. vectensis (Fig. 7F). The detection of GPBP-2 in both compartments of the N. vectensis further suggests GPBP-2 functions as both an intracellular ceramide

The evolution and diversification of GPBP
transporter and an extracellular kinase. It has been demonstrated that phosphorylation of SR1 domain decreases the intracellular function of GPBP-2 (50) and protein secretion (24). Together, these studies highlight modification of the SR1 domain as a potential mechanism for toggling between the intracellular and extracellular functions of GPBP-2 in invertebrate organisms. Furthermore, GPBP-2 having both intra-and extracellular localization with the potential to function as both a lipid transporter and as a kinase in early metazoans implies that it played an important role in the evolutionary transition to multicellular animals (Fig. 8).
Our phylogenetic studies demonstrate that GPBP-1 arose as a second isoform early in chordate (Fig. 8). The emergence of GPBP-1, as an extracellular component of BM, coincided with the increased complexity of the BM that enabled the genesis and evolution of multicellular vertebrate tissues (2,16). Key mechanisms in the diversification of GPBP-2 into GPBP-1 were intron gain events that resulted in the insertion of a second serine repeat domain (SR2). The SR2 domain of GPBP-1 is known to enhance phosphorylation potential, interaction with other basement membrane proteins, and extracellular localiza-tion (26). These findings suggest that the enhanced extracellular kinase activity of GPBP-1 was essential for its role as an early constituent of the BM that enabled the formation and evolution of epithelial tissues. GPBP-3, a mammalian innovation (Fig. 8), further enhances GPBP-1 secretion (27). Collectively, we conclude that the GPBP isoforms, functioning both inside and outside the cell, are of fundamental importance for cell and tissue function.

Conserved domain analysis
The pfam database (http://pfam.xfam.org/) 5 was used to identify sequence homology and conserved domains.

Phylogenetic analysis
MSAs of GPBP proteins from various species (Table S1) were generated using Geneious software (17). Phylogenetic trees were also constructed within Geneious using RAxML (version 7.2.8)

The evolution and diversification of GPBP
with the following settings: RAxML rapid hill-climbing mode, using one distinct model/data partition with joint branch length optimization, executing 100 nonparametric bootstrap inferences, ML estimate of 25 per site rate category. Final tree likelihood was evaluated and optimized under GAMMA model parameters and estimated up to an accuracy of 0.1000000000 log likelihood units.

Identification of non-AUG translation initiation sites
Non-AUG start sites were identified using Virtual Ribosome software (version 2) (56).

Genomic characterization of N. vectensis COL4A3BP
The genomic arrangement of COL4A3BP was analyzed using the Nematostella genome browser (https://genome.jgi.doe. gov/Nemve1/Nemve1.home.html) (55). Several assembly inversion errors were detected by manual inspection of the genomic region coding for NvGPBP. CLC software was used to make corrections to the genomic sequence. PCR using gap-flanking primers was used confirm sequence corrections. The following primers were used for the amplification reaction: forward, CCTGTACCACCCCCTACAGA; reverse, CCC-CATCTTCATGAACCAAGT.

Expression of intracellular and extracellular NvGPBP
Flagged Nematostella GPBP was expressed with the Bacto-Bac Baculovirus Expression System in Sf9 insect cells (Thermo Fisher Scientific).

Immunofluorescence studies
Immunofluorescence slides were prepared using both frozen and paraffin-embedded samples. Nematostella samples for frozen sections were prepared by immobilizing the samples in one-third strength sea salt water containing 7% MgCl 2 , followed by dehydration in 10% and then 30% sucrose baths followed by flash freezing in OTC. Paraffinembedded slides were deparaffinized and rehydrated using citrate buffer, pH 6.3. Slides were fixed with chilled acetone washed in PBS and blocked with goat serum, after which slides were treated with mouse N27 and rat JK2 mAb for 2 h and then washed in 3ϫ PBS for 5 min. Anti-mouse or anti-rat antibody with fluorescent tag was used as secondary antibody and allowed to incubate for 1 h and 45 min and then washed with 3ϫ PBS for 5 min. Nuclear staining was performed using 4Ј,6-diamidino-2-phenylindole. All slides were analyzed using a high-definition microscope at the University of Valencia core. Negative controls were prepared in this manner except with the application of N27.

In vitro phosphorylation assays
Phosphorylation assays with purified recombinant Nematostella or human GPBP were performed as described (24).