Demosponge and Sea Anemone Fibrillar Collagen Diversity Reveals the Early Emergence of A/C Clades and the Maintenance of the Modular Structure of Type V/XI Collagens from Sponge to Human*

Collagens are often considered a metazoan hallmark, with the fibril-forming fibrillar collagens present from sponges to human. From evolutionary studies, three fibrillar collagen clades (named A, B, and C) have been defined and shown to be present in mammals, whereas the emergence of the A and B clades predates the protostome/deuterostome split. Moreover, several C clade fibrillar collagen chains are present in some invertebrate deuterostome genomes but not in protostomes whose genomes have been sequenced. The newly sequenced genomes of the choanoflagellate Monosiga brevicollis, the demosponge Amphimedon queenslandica, and the cnidarians Hydra magnipapillata (Hydra) and Nematostella vectensis (sea anemone) allow us to have a better understanding of the origin and evolution of fibrillar collagens. Analysis of these genomes suggests that an ancestral fibrillar collagen gene arose at the dawn of the Metazoa, before the divergence of sponge and eumetazoan lineages. The duplication events leading to the formation of the three fibrillar collagen clades (A, B, and C) occurred before the eumetazoan radiation. Interestingly, only the B clade fibrillar collagens preserved their characteristic modular structure from sponge to human. This observation is compatible with the suggested primordial function of type V/XI fibrillar collagens in the initiation of the formation of the collagen fibrils.

Like all collagens, the fibrillar molecules are made of three ␣ chains, which can either be identical or result from a combination of two or three genetically distinct ␣ chains. Each ␣ chain consists of a major uninterrupted triple helical or collagenous domain made up of ϳ338 Gly-Xaa-Yaa triplets, which is flanked by two noncollagenous regions, the N-and C-propeptides. During the maturation of procollagens into collagen molecules, the two propeptides are generally cleaved by specific proteinases yielding processed molecules consisting of an ϳ300-nm-long rod-like structure, representing the triple helix, flanked by short noncollagenous segments, the N-and C-telopeptides.
Once processed, these fibrillar collagen molecules are involved in the formation of the well known cross-striated fibrils. In mammals, the fibrillar collagens involved in the formation of cross-striated fibrils are types I-III, V, and XI. These fibrils are usually heterotypic structures, consisting of one quantitatively minor (V or XI) and one or two quantitatively major (I-III) types of fibrillar collagen. Fibrils present in cartilage, which are constructed with type II and XI collagens, can be distinguished from those located in noncartilaginous tissues, which include type I, III, and V collagens. More recently, it has been shown that a newly characterized fibrillar collagen, type XXVII, is involved in the formation of thin nonstriated fibrils (11,12). As is the case for type XXIV collagen, the major triple helix of type XXVII is slightly shorter than other fibrillar collagen chains and has two glycine substitutions and one Gly-Xaa-Yaa-Zaa imperfection (13)(14)(15).
From evolutionary studies, mammalian fibrillar collagen chains have been divided into three clades: the A clade including types I-III and the pro␣2(V) chains; the B clade comprising pr␣1(V), pro␣3(V), pro␣1(XI), and pro␣2(XI) chains; and the C type XXIV and XXVII collagen chains (14,16). The ␣ chains of the A clade possess a VWC 2 module (absent in the pro␣2(I) chain) in their N-propeptide, in addition to a minor triple helix. The other two fibrillar collagen clades possess a TSPN module in their N-propeptide in addition to or in the absence of a minor triple helix for B and C clade members, respectively. These three fibrillar collagen clades defined in vertebrates have been found in deuterostome invertebrates, whereas only A and B clade members have been characterized in protostomes (17)(18)(19). More recently, several Hydra fibrillar collagen chains have been characterized, which from phylogenetic studies appear to be most closely related to the A clade members (9), although none of these collagens possess a VWC module in their N-propeptide. Moreover, Blast analyses of sponge or Hydra ESTs suggest the presence of B clade collagens in these taxa (9,20). Fibrillar collagen chains of undetermined clade have also been characterized in the freshwater demosponge Ephydatia mülleri (21) and in the marine demosponge Suberites domuncula (22).
In this report, we have studied the evolution of fibrillar collagens by taking advantage of the publicly available genome data from the sponge Amphimedon queenslandica (formerly classified as Reniera sp.) and the sea anemone Nematostella vectensis. We demonstrate that the formation of an ancestral B clade fibrillar collagen chain predated the eumetazoan radiation. Moreover, demosponge and cnidarian data suggest that although the emergence of the three fibrillar collagen clades occurred early in evolution, only the B clade preserves its characteristic modular structure in modern metazoans, from sponges to humans.

EXPERIMENTAL PROCEDURES
Data Base Searching-Sequences from metazoan fibrillar collagen chains used in this study were obtained either from the European Bioinformatics Institute, from previous work on mosquito, honeybee, and ascidian collagens (17), or from blast searches in sponge and cnidarian genomes. To identify fibrillar collagens in early branching metazoans, different approaches were used. For the demosponge A. queenslandica, simple reads from whole genome shotgun sequencing available in Trace Archive data bases at National Center for Biotechnology Information were mined with Blast using sequences encoding C-propeptides and triple helix from different sources. Each read isolated by this approach and its mate were used to construct contigs by repeating several cycles of blast analysis and contig assembly. Thus, this strategy permitted us to assemble five genomic contigs and to characterize seven distinct genes encoding fibrillar collagen chains (termed Amq1␣ to Amq7␣). Each contig sequence was checked by compiling all of the overlapping whole genome shotgun reads, the coverage being greater than 4. Gene structures were (i) predicted by using the ab initio program GENSCAN (23) at the Massachusetts Institute of Technology and (ii) completed by careful examination of the open reading frame and by matching these assemblies with existing ESTs. Several ESTs covering part of the coding region were obtained for Amq1␣ (CAYH1651 and CAYI6544), Amq2␣ (CABF4997, CAYH5770, CAYI7575, and CAYI5900), Amq3␣ (CABF3931, CAYH3995, CAYH3529, CAYH6239, CAYH4772, and CABF22710), Amq4␣ (CAYI5983, CAYH5842, and CAYI1569), Amq5␣ (CAYH3025, CAYI6261, CAYI7211, and CAYI7901), Amq6␣ (CABF3357, CAYH6535, and CAYI9396), and Amq7␣ (CAYI8810 and CAYI1553). A. queenslandica ESTs can be downloaded from the Ensembl Trace Server.
The accession numbers, species abbreviations, and sources are compiled in Table 1. The modular architecture of the fibril- lar collagen chains has been analyzed using the Smart server (24). The nucleotide sequences of the sponge and Hydra contigs assembled in this work are available upon request. Sequences from the choanoflagellate Monosiga brevicollis genome (25) encoding triple helical or C-propeptide (COLF1) modules were obtained using the advanced search tool of the Joint Genome Institute server. Blast analysis failed to recover additional sequences encoding TSPN or COLF1 modules.
Amphimedon Molecular Techniques-A. queenslandica adult sponges, larvae, embryos, and postlarvae were procured as previously described (26,27). After storage in RNA later (Sigma), total RNA was extracted using an RNeasy extraction kit (Qiagen) as per the manufacturer's instruction. RT-PCR was successfully conducted on total RNA as previously described (28)  Alignment and Evolutionary Analysis-The major triple helix of metazoan fibrillar collagen chains were first aligned using ClustalW (29) with the PAM alignment matrix at the European Bioinformatics Institute. The resulting initial alignments were manually improved using the SeaView alignment editor (30) and were scanned using the RASCAL component (31) of PipeAlign, a toolkit for protein family analysis. Neighbor-joining (NJ), maximum likelihood (ML), and Bayesian phylogenetic analyses were performed on the validated multiple alignments. NJ trees were determined using the Phylo win program (30). NJ bootstrap support was based on 1000 replicates using SEQBOOT and CONSENSE (majority rule extended) of the PHYLIP package (32) to generate data replicates and consensus trees, respectively. The PHYML v2.4.4 algorithm (33) was applied for the ML analyses, under the WAG amino acid substitution model and with 100 bootstrapped data sets using the PhyML Online server. Bayesian analysis was carried out as implemented in MrBayes 3.1.2 (34, 35) with a mixed amino acid model incorporating invariant sites and gamma parameter. Analyses in MrBayes were conducted using default settings, with two parallel runs of one million generations each, using four chains and a sample frequency of 100. Likelihoods of Bayesian analyses converged after the initial 21,000 generations. We therefore discarded the initial 210 trees from each of the two parallel runs and computed posterior probabilities from the remaining trees. The standard deviation of split frequencies after 1 million generations was below 0.01.
The illustrations were drawn using the TreeView program (36) and then annotated using Adobe Illustrator. In the absence of a good outgroup, some trees were rooted using the midpoint rooting method and the retree editor from the PHYLIP package (32). Midpoint rooting places the root at the middle of the longest path between the two most distantly related taxa.

Demosponge and Cnidarian Fibrillar Collagen Diversity-
Aiming to understand the early evolution of fibrillar collagens, we searched the genomes and ESTs of one demosponge and two cnidarians for genes possessing triple helix and C-propeptide sequences similar to Hydra and human fibrillar collagens. This survey led to the characterization of seven fibrillar collagen ␣ chains in the demosponge A. queenslandica, named Amq1␣ to Amq7␣, and eight fibrillar collagen ␣ chains in the sea anemone N. vectensis, named Nve1␣ to Nve8␣. For H. magnipapillata, we could detect only the complete major triple helical sequences for the ␣ chains orthologous to those previously characterized in H. vulgaris (Hcol1, Hcol2, Hcol3, and Hcol5) (9). The sponge and cnidarian ␣ chains have the general modular structure of eumetazoan fibrillar collagens, i.e. a major triple helix flanked by the N-and C-propeptides ( Fig. 1)  (see "Experimental Procedures" and Fig. 1B). For this reason, the mRNA sequences encoding N-propeptide domain of sponge chains Amq1␣, Amq3␣, Amq5␣, Amq6␣ and Amq7␣ were amplified by RT-PCR to confirm our predictions and to analyze the expression of the related collagen genes throughout the life cycle of A. queenslandica (Fig. 2). Most of these genes, specifically Amq1␣, Amq3␣, Amq5␣, and Amq7␣, are expressed throughout the sponge life cycle. Among these genes, Amq1␣ and Amq3␣ have higher expression levels during metamorphosis than other stages, whereas Amq7␣ has lower expression levels (barely detectable) during this period. In contrast, Amq5␣ seems to be more highly expressed during embryogenesis and in the larva, with levels dropping off during metamorphosis and in adults. The last gene, Amq6␣, does not appear to be expressed in sponge adults.
Taking into account the cnidarian and sponge fibrillar collagens for which the N-propeptide sequence is available, three groups of ␣ chains can be distinguished by their modular orga-nization. The first group includes fibrillar collagen chains that have an N-propeptide restricted to a minor triple helix (Amq1␣, Amq3␣, Amq4␣, and Hydra Hcol1), a situation observed in some A clade members such as mammalian pro␣2(I) and sea urchin 1␣ chains (37). Moreover, Amq1␣ appears to be the ortholog of the E. mülleri Emu1␣ chain with two glycine substitutions and one Gly-Xaa-Yaa-Zaa imperfection in identical positions (Fig. 1B).
The second group comprises cnidarian ␣ chains possessing WAP modules in their N-propeptide (Nve1␣, Nve2␣, Nve3␣, Hcol2, and Hcol3; Fig. 1C and Ref. 9). It should be noted that despite the high level of amino acid identity between fibrillar collagens of the two Hydra species (close to 100%), two differences were observed for two collagen genes. First, the major triple helix of H. magnipapillata Hcol3 is comprised of 1023 residues instead of 969 amino acids for the H. vulgaris Hcol3 chain. In H. magnipapillata, the additional 54 residues are encoded by a 162-bp exon (supplemental Fig. S1A), suggesting that the discrepancy between these two species results from (i) an alternative splicing event in H. vulgaris, (ii) the loss of this exon in H. vulgaris, or (iii) a cloning artifact in the cDNA characterized in the H. vulgaris study. Second, the H. magnipapillata Hcol2 fibrillar collagen chain possesses one less Gly-Xaa-Yaa triplet in the main collagenous domain compared with the H. vulgaris chain. Hence, one of the two GEQ triplets (31st or 32nd of the major triple helix) present in the H. vulgaris Hcol2 chain is absent from H. magnipapillata (supplemental Fig. S1B).
In the third group, which includes sponge Amq5␣, Amq6␣, and Amq7␣ and sea anemone Nve7␣ and Nve8␣ collagen chains (Fig. 1), we noted the presence of a large noncollagenous domain in their N-propeptide. From SMART analyses, this noncollagenous domain clearly corresponds to a TSPN module for Amq5␣ and Nve7␣. As shown in Table 1, the TSPN module of sea anemone Nve7␣ and sponge Amq5␣ chains have E value scores comparable with those obtained with bilaterian B clade and C clade collagen TSPN modules, respectively. From multiple alignments ( Fig. 3 and Table 2), the TSPN domain of Amq5␣ and Nve7␣ share on average 18 and 30% of identity with comparable modules of human B clade members, respectively. For the other sponge (Amq6␣ and Amq7␣) and sea anemone (Nve8␣) chains in this group, the large noncollagenous domain might also correspond to a TSPN module (Table  1), although the E values for the TSPN modules are not significant in SMART. The peculiar location of this noncollagenous domain in their N-propeptide strongly supports that these sequences are TSPN modules. This result is in agreement with poor identity observed for this module between B and C clade members or diploblast ␣ chains and B/C clade fibrillar collagens (Table 2 and Table 1 for abbreviations. OCTOBER 17, 2008 • VOLUME 283 • NUMBER 42 members (they contain a TSPN module and a minor triple helix), whereas Amq7␣ and Nve8␣ are related to C clade chains (they contain a TSPN module but not a minor triple helix). However, the latter two chains differ from human C clade collagens in the length of their major triple helix (Fig. 1): longer for Amq7␣ (1008 residues) and significantly shorter for Nve8␣ (961 residues). From the data available to date, we could not investigate the N-propeptide structure of three N. vectensis chains, Nve4␣ to Nve6␣, and of the sponge Amq2␣ fibrillar collagen chain.

Early Evolution of Fibrillar Collagen Genes
Collagen-like Proteins and COLF1 Module in the Choanoflagellate M. brevicollis-Choanoflagellates are considered the closest known relatives of metazoans. In the recent work describing the genome of M. brevicollis, the authors indicated the presence of sequence encoding triple helical domains and COLF1, a module always retrieved to date at the C terminus of metazoan fibrillar collagen chains (25). One of the two M. brevicollis putative collagenous proteins has several triple helical domains interspersed with VWA modules (Fig. 1D). This modular organization is reminiscent of the Hydra collagen Hcol6 (9), although the distribution of VWA and triple helical domains are not identical. Three other M. brevicollis proteins contain a COLF1 module. Sequence alignments of M. brevicol-lis COLF1 modules with comparable domains of some demosponge and cnidarian fibrillar collagens are represented in Fig.  4. Among the eight specific cysteine residues of metazoan COLF1 module, the three choanoflagellate sequences lack cysteine residues 5 and 8, which are responsible for an intrachain disulfide bond in fibrillar ␣ chains (38). These sequences also lack cysteine residues 2 and 3. Interestingly, either cysteine 2 or cysteine 3 is absent in some fibrillar collagen chains (for example, the vertebrate pro␣2(I) chain), whereas both of these cysteine residues are absent in sponge Amq3␣ and Amq4␣ and in the worm Arenicola marina FAm1␣ chain (16).
Genomic Context of Genes Encoding Demosponge and Cnidarian Fibrillar Collagen Chains-Two of the N. vectensis chains (Nve2␣ and Nve8␣) have major triple helices shorter than A and B clade chains (Fig. 1). As shown in supplemental Fig. S3, the loss of exons might explain the shortening of their major triple helix. With the exception of these two genes, all of the sponge and cnidarian fibrillar collagen genes have intron/ exon structures in their regions encoding the major triple helix that match the suggested organization of an ancestral fibrillar collagen gene (39). Moreover, in the sponge A. queenslandica, two sets of paralogous collagen genes are arranged in tandem (Amq1␣-Amq5␣ and Amq2␣-Amq4␣) in a head-to-head fashion. The base pair distance between the two ATG start codons of the gene pairs Amq1␣-Amq5␣ and Amq2␣-Amq4␣ is 932 and less of 3400, respectively. In the sea anemone, two fibrillar collagen genes (Nve6␣-Nve7␣) are also arranged in tandem, but in a tail-to-tail configuration, with 2699 bp separating their TGA stop codons.
Presence of B and C Clade Fibrillar Collagen Chains in Demosponges and Cnidarians-Comparison of modular organization of metazoan fibrillar collagen chains provides evidence for the presence of A clade and B/C clade-related chains in early branching metazoans. We further explored the affinities of sponge and cnidarian collagen chains with phylogenetic analyses. Because of the N-propeptide modular organization diversity and the C-propeptide variability, we only used the major triple helical sequences in these analyses. Multiple alignments of bilaterian major triple helix confirm the specific pattern of exon lengths of fibrillar collagen genes (supplemental Fig. S3 and Ref. 17). However, this is not always the case when multiple alignments are performed using all metazoan ␣ chains (data not shown), except when sponge Amq1␣, Emu1␣, Amq7␣, sea anemone Nve8␣, and ascidian Cin906 ␣ chains are removed. With the assumption that conservation of exon/intron organization of metazoan fibrillar collagen genes in the region encoding the major triple helix reflects conservation of related FIGURE 4. Representative multiple alignments of the COLF1 module from choanoflagellate, sponge, and sea anemone proteins. The alignments were generated with ClustalW. The gray boxes and asterisks denote sequence identity (Threshold ϭ 0.66) and perfectly conserved amino acids between the proteins analyzed, respectively. Numbering of the cysteine residues is indicated above the multiple alignments. See Table 1 and Fig. 1D for abbreviations.

TABLE 2
Percentage of identity between the TSPN module of metazoan fibrillar collagens amino acid sequences, the complete multiple alignments were corrected, taking into account the genomic organization of corresponding genes. These corrected multiple alignments (supplemental Fig. S4) were used to perform phylogenetic analyses using ML and Bayesian methods. As shown in Fig. 5A, bilaterian ␣ chains are clustered into the three well established fibrillar collagen clades. Moreover, the phylogenetic distribution of sponge and cnidarian ␣ chains agrees on the whole with classification based on their modular structure. Hence, all sponge and sea anemone fibrillar collagen chains harboring a TSPN-like module in their N-propeptide are related to B and C clade collagens. Among these collagens, the sea anemone Nve7␣ chain can be confidently assigned to the B clade, as was suggested by its modular structure. In contrast, the relationships between the sea anemone Nve8␣ and the C clade or the three sponge chains (Amq5␣, Amq6␣, and Amq7␣) and the B and C clades are poorly supported, with lower bootstrap values and posterior probabilities (see the values highlighted by stars in Fig.  5A). Two other clusters are present in the phylogenetic tree and correspond to sponge and cnidarian chains that do not contain a TSPN module (TSPN-minus) in their N-propep-tide. Both clades seem to be related to bilaterian A clade fibrillar collagens (high posterior probability, 94%). Sponge TSPN-minus ␣ chains branch at the base of the A clade, with Emu1␣ corresponding to the E. mü lleri ortholog of the A. queenslandica Amq1␣ chain.
To further analyze the position of sponge ␣ chains, we performed a second phylogenetic analysis excluding most of the cnidarian ␣ chains. This resulted in a phylogenetic tree of comparable topology (Fig. 5B). Hence, whatever the phylogenetic analysis, sponge fibrillar collagen chains can be separated into two groups, with (TSPN-plus) and without (TSPN-minus) TSPN modules, which were positioned inside the B/C and A fibrillar collagen clades, respectively. However, the specific relationship between the sponge TSPN-plus fibrillar collagen cluster and B/C clade members is difficult to determine. Although the modular structure of their N-propeptide suggested that Amq5␣ and Amq6␣ were affiliated with the B clade and Amq7␣ with the C clade, in the phylogenetic tree these three chains are clustered together in an unresolved position inside the B/C group. Their position at the base of the B clade is poorly supported in ML and Bayesian analyses (Fig. 5). The sponge TSPN-plus chains may instead be at the base of a cluster  OCTOBER 17, 2008 • VOLUME 283 • NUMBER 42

Early Evolution of Fibrillar Collagen Genes
including B and C clade collagen members, as found in the NJ analysis (low bootstrap support, 43%). Early Evolution of Metazoan Collagen Chains-Altogether, these data suggest that the divergence between ancestral A and B/C clade fibrillar collagen chains occurred very early during metazoan evolution, before the separation of poriferan and eumetazoan lineages. Based on our analyses, a model for the evolution of fibrillar collagen chains is presented in Fig. 6. The presence of proteins with either triple helical domains or COLF1 module in the choanoflagellate M. brevicollis suggests that an ancestral fibrillar collagen harboring both modules arose at the very dawn of metazoans. However, we cannot reject the possibility that this choanoflagellate has secondarily lost genomic sequence encoding a fibrillar collagen chain during its evolution.
In this model, the emergence of A and B/C clades predated the divergence of Parazoa and Eumetazoa, as illustrated by the diversity of fibrillar collagen chains present in the marine demosponge A. queenslandica. Indeed, sponge TSPN-minus ␣ chains are related to the A clade. However, the addition of a VWC module in the N-propeptide of A clade fibrillar collagens seems to have occurred after the cnidarian/bilaterian split. Interestingly, the WAP module present in some cnidarian A clade-related chains corresponds to a cysteinerich domain (8 cysteine residues) that is reminiscent of the VWC module (10 cysteine residues) in its length (ϳ50 -60 residues), its location in the N-propeptide, and the presence of two successive cysteine residues near its C terminus. This suggests that a VWC module may have evolved from a WAP module after cnidarians had departed from the main eumetazoan lineage. However, from the StellaBase web server, several predicted sea anemone proteins contain VWC modules. From SMART analysis, all of these sequences are positively recognized as VWC modules (E values comprised between 10 Ϫ5 and 10 Ϫ1 ), this result being in contradiction to a close relationship between WAP and VWC domains.
As shown in our model, two distinct scenarios can be proposed for the evolution of the B and C clades. In the first scenario, which is supported by our phylogenetic analyses, sponge TSPN-plus collagens are descended from a B/C clade ancestral fibrillar chain. In this case, the divergence of B and C clades occurred between Parazoa-Eumetazoa and Cnidaria-Eumetazoa splits, as suggested by the presence of B and C clade-related fibrillar collagen chains in the sea anemone. In the second scenario, supported by the modular organization of sponge collagen N-propeptides, the emergence of B and C clades from a B/C ancestor occurred earlier, before the emergence of sponges.

DISCUSSION
The sea anemone genome has revealed that in regard to transcription factors and other protein families, the last common ancestor to all modern eumetazoans had a complex genome, which was more similar to vertebrate genomes than those of flies and nematodes (40,41). In contrast, the genome of the demosponge A. queenslandica indicates that gene families may have not yet fully diversified in the first metazoan animals (42)(43)(44). Here we use the cnidarian and sponge genomes to explore In hypothesis H1, demosponges possess fibrillar collagen chains descended from an ancestral B/C clade fibrillar collagen chain containing a TSPN module in its N-propeptide. In this scenario, the emergence of B and C clades occurred between Parazoa-Eumetazoa and Cnidaria-Bilateria splits. In hypothesis H2, the divergence of B and C clades from an ancestral B/C clade ␣ chain predated metazoan cladogenesis. Whatever the scenario, the B clade seems to correspond to the most conserved fibrillar collagen family, whether it is at the modular level (from sponge to human) or at the sequence level (from sea anemone to human).
the origin and evolution of fibrillar collagen. Based on phylogenetic analyses, the emergence of the three fibrillar collagen clades predated the emergence of the Eumetazoa. Although not strongly supported, the general congruence between this analysis and the modular structure of sponge fibrillar collagens is compelling evidence of the early emergence of the A, B, and C clades, possibly prior to the divergence of poriferan lineage. The major finding that will be discussed below is the great conservation of the B clade during metazoan evolution and its existence in the first multicellular animals. Early Emergence of the Three Fibrillar Collagen Clades-The use of newly available genomic and EST sequence data from early branching animals gave us insights into the early evolution of the fibrillar collagen family, although some issues remain unresolved. We have shown that demosponges already have fibrillar collagen chains related either to the A clade or the B/C clade. Although the modular organization of the TSPN-Plus sponge fibrillar collagen chains suggests that B and C claderelated chains are present in demosponges, phylogenetic studies do not lend more support to this scenario than to one that sees these two fibrillar collagen clades emerging between Parazoa-Eumetazoa and Cnidaria-Bilateria splits (Fig. 6). The difficulty in supporting either of these scenarios in our phylogenetic analysis has several origins. First, the long branch lengths indicate increased levels of sequence divergence in sponge and cnidarian fibrillar collagen chains, probably reflecting the ancient divergence of these phyla. Several studies suggest that the eumetazoan ancestor lived ϳ630 -830 million years ago (40,45). However, the cladistic distribution observed in the phylogenetic trees (Fig. 5) does not appear to be due to long branch attraction events because the same clustering was obtained with all methods used, and the sponge chains were retrieved at the base of A and B-C clades. The second point results from the lack of resolution of basal metazoan lineages. The three sponge classes Hexactinellida, Demospongia, and Calcarea are generally considered to be paraphyletic (46). It has also been proposed that Homoscleromorpha, a group of siliceous sponges that possess a basement membrane-like structure, may form a fourth sponge class and should be considered closer to eumetazoans than the demosponges (47). The future availability of genome data from other animal groups at the base of the metazoan tree (all different sponge classes and ctenophores) should improve our model of evolution.
The lack of fibrillar collagen gene but the presence of sequences encoding proteins harboring either collagenous sequence or a COLF1 module in the newly published genome of the unicellular M. brevicollis (25), a choanoflagellate representing a sister group to the Metazoa, suggest that fibrillar collagens are metazoan-specific extracellular matrix proteins. Our analyses suggest that these genes evolved at the dawn of the Metazoa to give rise to three clades, which were originally defined in mammals. One of the paradoxes we found in this study is the high number of fibrillar collagen genes in sponges and cnidarians in comparison with protostomes (two genes in honeybee and mosquito) or the invertebrate chordate Ciona intestinalis (four genes). As previously indicated for Hydra (9), some lineage-specific gene duplications might explain the high number of TSPN-minus fibrillar collagen chains in Porifera and Cni-daria. The clustering of certain collagen genes in the demosponge and sea anemone genomes supports this scenario. The relationships of Hydra and sea anemone fibrillar collagens, despite the presence in some of these ␣ chains of a WAP module, is not obvious. As previously indicated, the depth of the Hydra-Nematostella split is comparable with the protostomedeuterostome divergence and emphasizes the distant relationship between anthozoans and hydrozoans (40).
B Clade Collagens and Metazoan Evolution-An important result of this study is the conservation of B clade fibrillar collagens, with the same modular organization and triple helix characteristics from sponge to human. More precisely, in our model and with the available data, cnidarians are the earliest branching metazoan phylum to possess definitive B clade fibrillar collagen ortholog. Moreover, phylogenetic analyses support a relationship between sponge TSPN-plus and B clade fibrillar collagen chains with low statistical support (Fig. 5); although based on modular presence and organization, the sponge possesses B (Amq5␣ and Amq6␣) and C (Amq7␣) clade ␣ chains (Fig. 1). Interestingly, Amq5␣ seems to be more highly expressed during embryogenesis than in adult, a result reminiscent of vertebrate type V collagen (48). These results are in agreement with previous works suggesting some similarities between invertebrate and types V/XI fibrillar collagen chains (2,7).
Recent data support a pivotal function of type V collagen molecules (␣1(V) 2 and ␣2(V)) in the nucleation of fibril assembly in mice noncartilaginous tissues (49). In these tissues, mature fibrils are generally heterotypics and are made of types I, III, and V collagens. In col5a1 Ϫ/Ϫ mice, in the absence of ␣1(V) chain, fibrils are virtually absent in most noncartilaginous tissues, although embryonic fibroblasts synthesize and secrete normal amounts of type I collagen. It has also been reported that type XI is probably buried inside types II/XI heterotypic fibrils in cartilage (50), suggesting a common role for types V and XI collagens (mostly B clade fibrillar collagens) in early fibril initiation. More recently, it has been demonstrated that vertebrate type XXVII C clade fibrillar collagen form 10-nmthick nonstriated fibrils (11,12), an ultrastructural network that is entirely distinct from the well known striated fibrils made of A and B clade collagens. In this respect, C clade collagens (not demonstrated for type XXIV collagen) also have the capacity to initiate the formation of fibrils. Moreover, C clade collagens form thin nonstriated fibrils, whereas types V and XI also play an important function in the regulation of fibril diameter. One of the mechanisms regulating the growth of fibrils is the retention of the N-propeptide for vertebrate types V and XI collagens, the sea urchin 5␣ chain, and the hydra Hcol1 ␣ chain (51)(52)(53). From in vitro fibrillogenesis and molecular modeling of triple helical peptides, it has also been argued that the presence of bulky hydrophobic amino acids and glycosylated hydroxylysine residues in the major triple helix might be the main contributor to limiting the lateral accretion of fibrillar collagen molecules, with hydroxylation of lysine residues occurring in the Yaa position of repeating collagen Gly-Xaa-Yaa triplets (54). C clade and to a lesser extent B clade collagen chains have generally reduced levels of alanine and higher levels of lysine and subsequently glycosylated hydroxylysine residues in com-parison with A clade collagens (9,12). From these analyses, it has been suggested that the relatively low levels of alanine and high levels of lysine in the major triple helix of a fibrillar collagen may be a predictor of its capacity to form thin fibrils.
In agreement with this hypothesis, the Hydra fibrillar collagen chains present alanine and lysine levels comparable with B and C clade vertebrate collagens and are involved in the formation of thin, 10-nm nonstriated fibrils. In demosponges, striated fibrils have a uniform diameter of 25 nm (2). However, in their major triple helix, the demosponge collagen chains Amq1␣ to Amq6␣ have an alanine level (6.5-11.05%) comparable with human A clade collagens (I-III, 8.8 -11.2%,) and a lysine level (3.4 -5.75%) comparable either to human A clade collagens (I-III, 3-3.7%) or to types V and XI (3.6 -5.4%) collagens. However, in these sponge fibrillar collagens, the lysine residues are more often observed in the Yaa position of collagenous Gly-Xaa-Yaa triplets. Hence, their lysine levels in the Yaa position (2.9 -4.8%) are comparable with those observed in Hydra and in human B and C clade fibrillar ␣ chains (2.4 -4.7%). The last demosponge fibrillar collagen chain, Amq7␣, presents in its triple helix a low level of alanine (3.4%) and a very high level of lysine residues (9.7%). These observations indicate that demosponges have triple helical characteristics favoring the formation of thin fibrils, as is the case with Hydra fibrillar collagen chains. Altogether, these data suggest that the B/C clade ancestral fibrillar collagen chain might have had the potential to primarily initiate the formation of fibrils and to secondarily regulate the fibril diameter. Moreover, the presence in the first multicellular animals of members of the three fibrillar collagen clades indicates their common functional importance in the formation of collagen-based extracellular matrices.