|
Originally published In Press as doi:10.1074/jbc.M500257200 on April 23, 2005
J. Biol. Chem., Vol. 280, Issue 29, 26659-26668, July 22, 2005
Evolution of Matrix and Bone -Carboxyglutamic Acid Proteins in Vertebrates*
Vincent Laizé ,
Paulo Martel¶||,
Carla S. B. Viegas ,
Paul A. Price**, and
M. Leonor Cancela
From the
Centro de Ciências do Mar (CCMAR), Universidade do Algarve, 8005-139 Faro, Portugal, ¶Instituto de Tecnologia Química e Biológica (ITQB), Universidade Nova de Lisboa, 2781-901 Oeiras, Portugal, ||Centro de Biomedicina Molecular e Estrutural (CBME), Universidade do Algarve, 8005-139 Faro, Portugal, and **Division of Biology, University of California San Diego, La Jolla, California 92093-0368
Received for publication, January 7, 2005
, and in revised form, March 31, 2005.
 |
ABSTRACT
|
|---|
The evolution of calcified tissues is a defining feature in vertebrate evolution. Investigating the evolution of proteins involved in tissue calcification should help elucidate how calcified tissues have evolved. The purpose of this study was to collect and compare sequences of matrix and bone -carboxyglutamic acid proteins (MGP and BGP, respectively) to identify common features and determine the evolutionary relationship between MGP and BGP. Thirteen cDNAs and genes were cloned using standard methods or reconstructed through the use of comparative genomics and data mining. These sequences were compared with available annotated sequences (a total of 48 complete or nearly complete sequences, 28 BGPs and 20 MGPs) have been identified across 32 different species (representing most classes of vertebrates), and evolutionarily conserved features in both MGP and BGP were analyzed using bioinformatic tools and the Tree-Puzzle software. We propose that: 1) MGP and BGP genes originated from two genome duplications that occurred around 500 and 400 million years ago before jawless and jawed fish evolved, respectively; 2) MGP appeared first concomitantly with the emergence of cartilaginous structures, and BGP appeared thereafter along with bony structures; and 3) BGP derives from MGP. We also propose a highly specific pattern definition for the Gla domain of BGP and MGP.
 |
INTRODUCTION
|
|---|
BGP1 (bone Gla protein or osteocalcin) and MGP (matrix Gla protein) belong to the growing family of vitamin K-dependent (VKD) proteins, the members of which are involved in a broad range of biological functions such as skeletogenesis and bone maintenance (BGP and MGP), hemostasis (prothrombin, clotting factors VII, IX, and X, and proteins C, S, and Z), growth control (gas6), and potentially signal transduction (proline-rich Gla proteins 1 and 2). VKD proteins are characterized by the presence of several Gla residues resulting from the post-translational vitamin K-dependent -carboxylation of specific glutamates, through which they can bind to calcium-containing mineral such as hydroxyapatite. To date, VKD proteins have only been clearly identified in vertebrates (1) although the presence of a -glutamyl carboxylase has been reported in the fruit fly Drosophila melanogaster (2) and in marine snails belonging to the genus Conus (3). Gla residues have also been found in neuropeptides from Conus venoms (4), suggesting a wider prevalence of -carboxylation.
Bone and matrix Gla proteins have essential roles in controlling tissue mineralization and form a subgroup within the VKD family. BGP and MGP have low similarity with blood coagulation factors, although they are believed to have diverged from a common ancestor (57). Although a number of questions still remain concerning their mode of action at the molecular level, a large number of studies (Refs. 1518, 26, 27, 31, and references therein) have clearly shown that BGP and MGP are central to the control of tissue mineralization mechanisms.
MGP is a 10-kDa protein produced and secreted by vascular smooth muscle cells (8) and chondrocytes (9) and significantly accumulated in bone, cartilage, and dentin from mammals, amphibians, and cartilaginous and bony fishes (1014). MGP gene expression is confined to proliferative and late hyperthrophic chondrocytes within the mammalian growth plate during endochondral bone formation and has been therefore described as a marker of the chondrogenesis cell lineage (15). A signal peptide, a phosphorylation domain, a -carboxylase recognition site, and 45 Gla residues are the conserved features among MGPs. The spontaneous calcification of arteries and cartilage in mouse lacking MGP or the abnormal cartilage and artery calcification in patients affected by the Keutel syndrome (autosomal recessive disorder caused by mutations in the MGP gene) indicates that this protein is a physiological inhibitor of mineralization (1618). Additional data suggest that MGP is involved in protecting tissues from ectopic calcification in mammals (17, 19) and controlling cartilage cell differentiation and mineralization during early development of chicken limbs (9). In vitro cell culture experiments have shown that MGP gene expression can be regulated by 1,25-(OH)2 vitamin D3 (20), retinoic acid (21, 22), extracellular calcium (23, 24), growth factors, and cell proliferation events (25), but the mechanisms responsible for the transcriptional regulation of MGP still remain largely unknown.

View larger version (86K):
[in this window]
[in a new window]
|
FIG. 1. BGP and MGP sequences used in this study and taxonomy of represented species. Taxonomic data were retrieved September 14, 2004 from the Integrated Taxonomic Information System at www.itis.usda.gov. a, this study; b, partial sequence; c, Protein Information Resource accession number.
|
|
BGP is a 5.6-kDa secreted protein that, unlike MGP, has proven to be specific for vertebrate calcified tissues, with the exception of cartilaginous fishes, and represents 12% of the total bone protein. BGP is synthesized by osteoblasts and odontoblasts. All known BGPs are synthesized as pre-pro-proteins and contain 3 Gla residues located within a conserved domain in the central part of the mature protein. Previous studies have failed to reveal a critical role for BGP in mammalian bone formation. BGP-deficient mice exhibit a higher bone mass when compared with control (26), and IR microspectroscopy has shown that BGP is required for the correct maturation of the hydroxyapatite in mammalian bone (27). The tertiary structures of bovine (28), porcine (29), and meagre (30) BGP have recently been determined by x-ray crystallography (porcine and meagre) and NMR (bovine). Despite some differences, the three structures are very similar, with a novel fold comprising a C-terminal region with the 3 helices and a small hydrophobic core and a short, unstructured N-terminal sequence. The 3 Gla residues (4 in meagre) are on the outer face of helix 2, coordinating the Ca2+ ions present in the structures and suggesting a mechanism for attachment to the surface of the hydroxyapatite crystals in bone (29). In the absence of Ca2+, the apoprotein form of BGP is in a disorganized state, as indicated by NMR studies (28). Interestingly, re-expression of MGP in arteries of MGP/ mice can reverse the arterial mineralization, a result that cannot be mimicked by BGP indicating that these evolutionarily related proteins do not work in a similar way (31).
Understanding more clearly the evolution of BGP and MGP will contribute to better evaluation of the various hypotheses about their role and function in tissue mineralization. It should also help illuminate how calcified tissues have evolved and provide key insights into vertebrate evolution. Specific goals of this work were: (i) to identify new MGP and BGP homologues using standard cloning methods or comparative genomics and data mining of available genomic and EST (expressed sequence tag) libraries, (ii) to provide insight into MGP and BGP common features by looking at similarities among gene and protein sequences, and (iii) to infer particular trends in BGP and MGP evolutionary history through a phylogenetic analysis of all available amino acid sequences. Finally, we propose a model for the early origin and evolution of BGP and MGP on the basis of results presented here and in previous studies (Refs. 7, 1214, and references therein).

View larger version (19K):
[in this window]
[in a new window]
|
FIG. 2. Structural organization of MGP (A) and BGP (B) coding sequences at the gene level. Black boxes indicate exons (or parts of exons) representing the coding sequence, starting from the translation initiation codon and ending at the translation termination codon. Numbers above the boxes indicate their length. Dashed line in zebrafish MGP sequence indicates incomplete data on intron size, although total length of the genomic fragment is known. Data on each gene of the mouse BGP gene cluster are also presented.
|
|
 |
EXPERIMENTAL PROCEDURES
|
|---|
BGP and MGP Sequence CollectionPreviously characterized sequences were obtained from the GenBankTM (www.ncbi.nlm.nih.gov) and Swiss-Prot (us.expasy.org) data bases. Additional sequences were either cloned using standard PCR techniques or reconstructed from EST and whole genome shotgun sequences obtained from public sequence data bases. dbEST and Trace Archive were extensively searched using the BLAST facilities at NCBI for sequences showing similarities to known BGP and MGP transcripts or genes. Species-specific sequences were first clustered, and elements of each cluster were assembled using Clustal X (32) to generate, after manual correction, highly accurate consensus sequences. Virtual transcripts and genes were deduced from joined consensus sequences using stringent overlap criteria. Virtual gene structure was predicted using comparative methods (homology to previously annotated genes) and electronic splicing as predicted by GenScan. Genomic DNA (prepared with Qiagen DNeasy tissue kit), cDNA libraries (prepared with Clontech Marathon cDNA amplification kit), or reverse transcribed RNA (prepared using Invitrogen Moloney murine leukemia virus reverse transcriptase) were used for PCR amplification. Genomic and cDNA fragments were amplified in a GeneAmp 2400 thermal cycler (PerkinElmer Life Sciences) using Taq DNA polymerase (Invitrogen) and the primer sets described in Table I. Following an initial denaturation step of 5 min at 95 °C, specific DNAs were amplified with 30 cycles (one cycle is 30 s at 95 °C, 45 s at Tm (see Table I), and 1 min at 72 °C) and a final elongation step of 10 min at 72 °C. The resulting PCR products were size-fractionated by agarose gel electrophoresis, purified, and ligated into pGEM-T Easy vector (Promega). Final identification was achieved by DNA sequence analysis.
View this table:
[in this window]
[in a new window]
|
TABLE I List of PCR primers used in this study Hd, Halobatrachus didactylus; Cs, Chilomycterus schoepfii; On, Oreochromis niloticus; Dr, Danio rerio.
|
|
Determination of Sparus aurata Genome SizeBlood was collected from a healthy specimen of gilthead seabream (S. aurata), smeared on a microscope slide, and subjected to genome sizing according to Hardie et al. (33).
Sequence Alignment and AnalysisSeparate alignments for MGP and BGP sequences were created using Clustal X and the Gonnet250 mutation matrix, a gap-opening penalty of 10, and a gap penalty of 0.2. Clustal X alignments were fed to the more rigorous T-Coffee multiple sequence alignment software (34) with parameters set to the default. Manual adjustments were made in a few cases to improve alignments. An alignment of the whole set of MGP and BGP sequences was produced using a similar procedure. Because there was no recognizable similarity between the MGP and the BGP sequences outside the conserved Gla region, only the latter part of the alignment was kept. Sequence logos presented in Fig. 3 were created from the T-Coffee multiple sequence alignments using the WebLogo facilities (35). The sequence logos are presented as a graphical display where the height of each letter is made proportional to its frequency. This shows the conserved residues as larger characters. Putative signal peptide and phosphorylation sites were identified in protein sequences using the SignalP and NetPhos facilities, respectively (29, 30).
Sequence IdentityPairwise sequence identity values were computed as percent of identical residues over the total number of aligned residues using alignments generated with T-Coffee.
Phylogenetic AnalysisAll maximum likelihood phylogenetic trees were built from T-Coffee amino acid alignments using the Quartet puzzling algorithm of the Tree-Puzzle software (36). The PAM250 mutation data matrix was chosen because it produced the smallest number of unresolved quartets in all cases. A total number of 100,000 puzzling steps was used, and the rate of change was taken as site-independent (the use of a distributed variable rate of change among sites was tried and produced worse results in all cases). Test runs were also performed with Proml from the Phylip suite of programs (37). Constraint phylogenetic trees were generated using TreeView (38), where the internal branch labels, which are an estimate of branch assignment reliability, are Tree-Puzzle branch support values and must not be confused with the bootstrap values produced by other programs, e.g. Phylip.
Calculation of Evolutionary RatesIf a clock-like behavior for evolution is assumed, approximate rates of evolution for molecular sequences can be calculated from sequence distances, provided that divergence times between taxa are known from the fossil record (39). Distances between groups were calculated from the T-Coffee BGP and MGP multiple sequence alignments using the Grishin formula to correct for multiple substitutions and rate variation among sites (40). The mammalian-avian split (310 Myr ago), tetrapod-fish split (405 Myr ago), mammalian-amphibian split (200 Myr ago), radiation of teleosts (430 Myr ago), and radiation of mammals (100 Myr ago) (4143) were used to convert evolutionary distances in PAM (point accepted mutations per 100 aa) per Myr. The final rates were averages of the values obtained with the above times. The clock-like assumption was checked with the Tajima test (44) and accepted at the 95% level for all triplets of sequences.
 |
RESULTS
|
|---|
Cloning and Identification of New BGP and MGP SequencesTo increase the likelihood of identifying all available BGP and MGP sequences from the vast amount of sequence data, various homologues were used as query sequences to search public sequence data bases. Positive search results were considered to be members of the BGP/MGP family if they showed significant sequence similarity to any of the previously identified members and if this similarity extended throughout the protein. Numerous sequences previously identified as BGP or MGP were collected, and their accession numbers or reference number (59) in the literature is indicated in Fig. 1. Our search within public data bases also identified many ESTs and whole genome shotgun sequences with similarities to previously characterized sequences from which we could reconstruct 6 cDNAs (pig BGP and MGP, rainbow trout BGP and MGP, Atlantic salmon MGP, and channel catfish MGP) and 2 genes (Torafugu MGP and green pufferfish MGP). Finally, 4 cDNAs (toadfish BGP and MGP, striped burrfish MGP, and Nile tilapia BGP) and 1 gene (zebrafish BGP) were amplified by reverse transcription-PCR from RNAs prepared from calcified tissues or by PCR from genomic DNA using the oligonucleotides listed in Table I. The identity of the amplified fragments was confirmed by sequence comparison with previously annotated sequences. Rhesus macaque MGP (Macaca mulatta, GenBankTM accession number AF162477
[GenBank]
) and steppe bison BGP (Bison priscus, Swiss-Prot accession number P83489
[GenBank]
) were not included in this study because the sequences were too small or identical to domestic cattle BGP, respectively. No evidence for the presence of any pseudogenes or alternatively spliced transcripts was found. The GenBankTM/Swiss-Prot accession numbers of all osteocalcin and matrix Gla protein sequences used in this study are listed in Fig. 1.
Taxa RepresentedA total of 48 complete or nearly complete sequences (28 BGPs and 20 MGPs) have been identified from 32 different species representing most classes of vertebrates (and only in vertebrates). These include mammals, birds, amphibians, bony fish, and cartilaginous fish (Fig. 1). Only one homologue of MGP and BGP was identified per species with one exception (the mouse, with a cluster of three BGP-related genes (45)). No homologues were found in archaea, bacteria, viruses, fungi, plants, arthropods, nematodes, and chordates, especially in the complete genomes of Saccharomyces cerevisiae, Drosophila melanogaster, Arabidopsis thaliana, Caenorhabditis elegans and Ciona intestinalis, the closest non-vertebrate ancestor of vertebrates with its genome completely sequenced.
Structure of MGP and BGP GenesBGP and MGP gene structure organization was determined from 16 genes representing 8 different species (non-fish species: African clawed frog, human, Norway rat, and house mouse; fish species: gilthead seabream, green pufferfish, torafugu, and zebrafish). For each new gene, putative splice sites and potential coding regions were predicted using GenScan. Predictions were verified by aligning the predicted amino acid sequences with those already characterized in other species and further confirmed by comparison with known gene structures. Results presented in Fig. 2 showed that BGP and MGP genes shared the same simple gene organization with four coding exons and three introns splitting the coding region. The phase of introns positioned in the coding sequence is also rigorously conserved not only among all BGP and all MGP genes but also between BGP and MGP genes (Table II). The intron phases, as defined by Patthy (46), in both BGP and MGP genes are: phase 1 for the first intron, phase 1 for the second intron, and phase 2 for the third intron. Although the lengths of exons are well conserved within as well as between BGP and MGP genes (exon 4 > exon 3 > exon 1 > exon 2), the total size of introns varies from 491 to 2514 bp in BGP genes and from 366 to 4461 bp in MGP genes (Table II). These variations are not related to the size of the genome (species with long introns are not necessarily those with big genomes; Table III), to a subset of sequences (fish versus non-fish sequences; Fig. 2) or to the compactness of the chromosome region to which they belong. Altogether, these results suggest an overall conservation of gene structure between MGP and BGP.
View this table:
[in this window]
[in a new window]
|
TABLE II Phase and length of introns present in BGP and MGP coding region Phase of intron is defined according to Patthy (46) and is indicated in parentheses (ii) intron lying between the first and second nucleotides of a codon (phase 1 intron); (iii) intron lying between the second and third nucleotides of a codon (phase 2 intron). Intron sizes are in base pairs.
|
|
Conserved Features of MGP and BGPIn pairwise comparisons of all BGP or MGP protein sequences (Table IV), overall sequence identity was 63 ± 11 (fish BGP), 65 ± 15 (amphibian and bird BGP), 79 ± 11 (mammalian BGP), 52 ± 15 (fish MGP), 50 ± 0 (amphibian and bird MGP), and 84 ± 3 (mammalian MGP) indicating more constrained sequences in mammals, especially for MGP. Protein size was found to be significantly different between BGPs and MGPs, the latter being the larger, for which it contributed the absence of a propeptide. Accordingly, BGPs were 97 ± 4 aa (unprocessed protein) and 48 ± 2 aa (mature protein), whereas MGPs were 110 ± 10 aa (unprocessed protein) and 92 ± 10 aa (mature protein). Higher variation in length within MGPs can be attributed to a larger protein in fish (116 ± 11 aa (unprocessed protein) and 97 ± 10 aa (mature protein) in fish versus 103 ± 0 aa (unprocessed protein) and 84 ± 0 aa (mature protein) in non-fish sequences) resulting mainly from an extended C terminus.
View this table:
[in this window]
[in a new window]
|
TABLE IV Pairwise percent identities among BGP (lower triangle) and MGP (upper triangle) protein sequences Diagonal bold values are sequence lengths, and shaded areas indicate identities within different groups of organisms (black, birds and amphibians; dark gray, mammals; light gray, fish). The identity values are calculated from alignments described in the text.
|
|
Analysis of MGP and BGP logos (Fig. 3) generated from T-Coffee multiple sequence alignments identified highly conserved residues (23 in MGP and 20 in BGP) that have been further analyzed using bibliographical data and tools available through the Internet. The following conserved domains were identified. 1) A transmembrane signal peptide was found in both BGP and MGP to control protein entry into the secretory pathway. Signal peptides were predicted to be 1621 and 1825 aa long in MGP and BGP, respectively. 2) A propeptide was found in BGP but not in MGP. The propeptide is cleaved by furin at the RX(K/R)R polybasic cleavage site that is present in all BGPs with one exception (meagre BGP, where the cleavage site is RXXR). This is a common feature for proteins known to require post-translational proteolytic processing such as clotting factors (47). 3) A phosphorylation domain was found in MGP (48) as well as 4) a -glutamyl carboxylation recognition site, which targets the VKD proteins to the -glutamyl carboxylase (GGCX). It is most likely that this region docks with the membrane-bound GGCX, bringing the active site of the enzyme in close proximity to the substrate glutamate residues on the precursor form of VKD proteins. 5) An ANXF domain was found in MGP, which is likely to be a proteolytic cleavage site involved in post-translational processing. 6) Two invariable cysteine residues required for intramolecular disulfide bridge were also identified. 7) A C-terminal Gla domain was found in both BGP and MGP, where the majority of the Glu residues are carboxylated. This domain is responsible for the high affinity binding of calcium ions. A conserved motif is found in the middle of the domain, which seems to be important for substrate recognition by the carboxylase. 8) A C-terminal carboxypeptidase processing site was found removing the exposed C-terminal basic residues. Interestingly, when rating the gene structure with the protein domains of MGP/BGP it is clear that each coding exon always encodes the same protein domain(s) (Fig. 4). Exon 1 encodes the complete transmembrane signal peptide and its cleavage site; exon 3 encodes the carboxylase recognition site, a proteolytic cleavage site (ANXF in MGP and RXXR in BGP) and part of the Gla domain; and exon 4 encodes the major part of the Gla domain including the canonical cysteine residues. Exon 2, the smallest exon in both genes, encodes a domain exhibiting less conservation; it is a highly conserved phosphorylation domain in MGP and a poorly conserved region of the BGP propeptide with no function assigned yet (Figs. 3 and 4). Altogether, these results suggest an overall conservation of protein structure between MGP and BGP.

View larger version (33K):
[in this window]
[in a new window]
|
FIG. 3. Conserved features among vertebrate MGP (A) and BGP (B) proteins. Sequence logos reveal the conservation of residues at particular positions in the protein sequence. Highly conserved residues are in black. *, Gla residues; 1, tyrosine at position 22 in MGPs, except in Ictalurus punctatus MGP; 2, serine at position 32 in MGPs, except in G. galeus MGP; 3, glutamic acid at position 88 in MGPs, except in mammals; 4, aspartic acid at position 89 in MGPs, except in G. galeus MGP; 5, proline at position 74 in BGPs, except in fish; 6, leucine at position 91 in BGPs, except in fish; 7, only in fish. CP, carboxypeptidase.
|
|
Putative serine, tyrosine, and threonine phosphorylation sites were identified for each sequence using NetPhos. No conserved phosphorylation site was identified in BGP, whereas three highly conserved serine phosphorylation sites (positions 25, 28, and 31 in the consensus sequence) were found in MGP (Fig. 3A). Interestingly, a serine phosphorylation site (position 78) and two tyrosine phosphorylation sites (positions 90 and 96) were also highly conserved but only in fish and amphibians (Ser-78), in mammals (Tyr-90) or in mammals and amphibians (Tyr-96).
Domains identified through the T-Coffee/Logo analysis were converted to PROSITE-like domain definition (Table V) and used to scan Swiss-Prot (release 45.0) and TrEMBL (release 28.0) vertebrate entries using the ScanPROSITE facilities at us.expasy.org. MGP phosphorylation domain definition identified most MGPs (only one exception, Galeorhinus galeus MGP), other matrix- or bone-related proteins (dentin matrix protein 1 and osteopontin-like protein), and proteins apparently unrelated to mineralization mechanisms. The ANXF domain definition identified all MGPs, the VKD TMG4 protein, osteomodulin (extracellular matrix protein produced by osteoblasts), and many other proteins including protein kinases and membrane receptors. GlaMGP and GlaBGP domain definitions identified all and only MGPs and BGPs, respectively. Similarly the GlaXGP domain definition identified all and only XGPs (bone and matrix Gla proteins). Therefore the BGP/MGP Gla domain can be considered the core family domain. The efficiency of the GlaXGP pattern in identifying BGPs and MGPs was evaluated using Swiss-Prot release 45.0 and TrEMBL release 28.0 (containing 25 BGPs and 16 MGPs) and compared with similar published pattern definitions including (i) PROSITE PS00011, (ii) Pfam PF00594, (iii) Smart SM00069, and (iv) InterPro IPR000294 and IPR002384. Published pattern definitions never identified the whole set of BGP/MGP sequences and always picked up other sequences, mostly other members of the VKD protein family (Table VI). On the contrary, the GlaXGP pattern definition identified all and only reference sequences demonstrating its specificity and its suitability to identify bone and matrix Gla proteins.
View this table:
[in this window]
[in a new window]
|
TABLE VI Efficiency of selected pattern definitions to identify MGP and BGP reference sequences (containing at least the mature protein) searching Swiss-Prot (release 45.0) and TrEMBL (release 28.0) data bases
|
|
Evolutionary AspectsMaximum likelihood Quartet puzzling phylogenetic trees were generated from BGP, MGP, and Gla domain T-Coffee multiple sequence alignments (28, 20, and 48 sequences, respectively) excluding poorly conserved regions (Fig. 5). Phylogenetic trees were rooted with either the sequence of the more ancient species (G. galeus MGP (GgaMGP)) for the MGP and Gla domain trees; Fig. 5, A and C) or the more ancient taxa (bony fish for the BGP tree; Fig. 5B). In agreement with the generally accepted phylogeny of vertebrates, our analysis clustered amphibian, avian, and mammalian sequences separately from that of fish. The unexpected position of the toadfish MGP (Halobatrachus didactylus MGP) in the MGP tree (toadfish is part of the Paracanthopterygii superorder, and H. didactylus MGP is therefore not expected to cluster together with proteins from Acanthopterygii) was confirmed by the cloning and the sequencing of a 194-bp MGP cDNA fragment from a closely related species, the oyster toadfish Opsanus tau (GenBankTM accession number AY383483
[GenBank]
).2 The peptide encoded (including 65% of the mature protein and most of the Gla domain) was 100% identical to H. didactylus MGP. This result may be due to the lack of a sufficient phylogenetic signal, caused by both short sequence length and high conservation of fish MGP sequences. Phylogenetic analysis of the Gla domain supports the notion that MGP and BGP represent distinct evolutionary groups (Fig. 5C), as indicated by the relatively high branching support values.

View larger version (14K):
[in this window]
[in a new window]
|
FIG. 4. Structural organization of MGP and BGP proteins. Exonic structure of MGP and BGP coding sequence is indicated below each protein structure. Dashed line represents intramolecular disulfide bond. SP, signal peptide; PP, propeptide; MP, mature peptide; GGCX, -glutamyl carboxylase recognition site; P, phosphorylation; C, conserved cysteine residues; ANXF, RXXR, and RR indicate proteolytic cleavage sites; E, exon.
|
|
Based on the calibration points described under "Experimental Procedures," both BGP and MGP averaged evolutionary rates that were close to 1.00 x 103 changes/site/Myr. These rates place BGP and MGP in the class of slowly evolving proteins (49), indicating a strong evolutionary pressure to conserve the sequence and likewise the structure.
Time Scale for BGP and MGP OriginTo estimate approximate times for the emergence of the MGP and BGP sequences, average intragroup evolutionary distances were estimated for the BGP and MGP sequences using the Grishin formula and the Gla region alignment of the 48 sequences. These average distances were then converted in time units using the previously estimated evolutionary rate of 0.1 PAM/Myr. Based on this estimate, the origin of MGP is placed at 480 ± 133 Myr ago and that of BGP at 381 ± 102 Myr ago. Although these values are only indicative, they agree well with our hypothesis for the origin of MGP and BGP in relation to the phylogeny of animal species (see "Discussion" and Fig. 7).
 |
DISCUSSION
|
|---|
This work identified 13 new sequences (8 MGPs and 5 BGPs) through cDNA cloning or sequence reconstruction, increasing the number of available sequences to 48 (20 MGPs and 28 BGPs). Taking advantage of the resulting large amount of data, we have compared MGP and BGP genes and proteins, created multiple sequence alignments to identify conserved features likely to be important for protein structure, function, and regulation, and performed a phylogenetic analysis of both proteins.
MGP and BGP Are Vertebrate-specific ProteinsThis work identified sequences across 32 different species representing most classes of vertebrates (including mammals, birds, amphibians, bony fish, and cartilaginous fish) but none in other organisms (including archaea, bacteria, viruses, fungi, plants, arthropods, nematodes, and chordates). An extensive in silico analysis of public data bases available through the Internet failed to identify a Gla domain within non-vertebrate genomes already complete. However, the fruit fly D. melanogaster protein Msp-300 (muscle-specific protein 300) (50) has shown some homology with the MGP Gla domain.3 The homologous region is located at the C terminus of Msp-300 and is only 18 aa long (Fig. 6). Despite being short, this region contains important MGP-conserved features like the invariant cysteine residues involved in intramolecular disulfide bridge and 3 Glu residues shown to be -carboxylated in vertebrates. A vitamin K-dependent -glutamyl carboxylase has also been described in D. melanogaster (51) suggesting that Glu residues present within the pseudo-Gla domain of Msp-300 could be -carboxylated and play a role in calcium binding. Despite similarities in both sequence and tissue distribution (i.e. both proteins are expressed in muscle cells), Msp-300 is clearly not a matrix Gla protein, and similarities observed in protein sequences may only indicate that MGP and Msp-300 ancestors have shared at some point the same Gla-like domain that evolved differently, but have retained some characteristic features.

View larger version (36K):
[in this window]
[in a new window]
|
FIG. 5. Quartet puzzling tree of MGPs (A), BGPs (B), and Gla domain (C) sequences, with maximum likelihood distances computed with the Dayhoff PAM250 matrix. Internal numbers are supporting values for the corresponding branch points (support values below 50% were omitted, producing multifurcated branching). The BGP tree is rooted with the bony fish group of sequences, and both the MGP and Gla domain trees are rooted with the soupfin shark sequence. Cartilag., cartilaginous.
|
|

View larger version (15K):
[in this window]
[in a new window]
|
FIG. 6. Msp-300 pseudo-Gla domain. MGP sequence is presented as sequence logos, where highly conserved residues are displayed as large black characters and an asterisk indicates Gla residues found in vertebrate proteins. Disulfide bond linking invariant cysteine residues is also indicated.
|
|
MGP- and BGP-conserved FeaturesMany invariant residues (shared or not between MGP and BGP) have been identified in consensus sequences obtained from multiple alignments. Their conservation over more than 450 million years of evolution strongly suggests that these residues are required for correct protein structure or to preserve a critical function or interaction (e.g. MGP with BMP2). It has been possible to cluster these conserved residues into various functional domains and post-translational processing sites. Some of them have been identified in previous studies (48, 52, 53), but one is described for the first time in this work (the ANXF putative peptide cleavage site) although it has already been observed in the past.4 This site is present in MGP and would cleave what is considered the core of the protein (the C terminus containing the Gla domain) from the N terminus containing the phosphorylation and the GGCX domains. The purification of low molecular weight MGP fragments identifying a cleavage site after the asparagine (N) residue of the ANXF domain further confirmed the post-translational proteolytic processing of MGP at this site (12).5

View larger version (18K):
[in this window]
[in a new window]
|
FIG. 7. MGP and BGP emergence during vertebrate evolution as a result of genome duplication events. Evolution time is in millions of years (Myr). Breaks in arrows indicate that lines of evidence of the existence of MGP and BGP are from preliminary unpublished results, i.e. MGP- and BGP-like immunoreactive proteins have been observed in lamprey and in shark, respectively (Footnote 7). Cartilag., cartilaginous; N.D., not detected.
|
|
In silico analyses of available sequences have predicted three highly conserved serine phosphorylation sites within the N-terminal domain of MGP and none in BGP in agreement with previous in vitro studies (48).6 It has been proposed recently that the Golgi casein kinase could be responsible for MGP phosphorylation at these sites (54), a post-translational modification that would serve as a sorting signal for MGP.
MGP and BGP PhylogenyVertebrate phylogeny estimated from the BGP and MGP set of sequences is consistent with the generally accepted phylogeny of vertebrates estimated from morphological and palaeontological data and is also in agreement with previous studies (7, 13, 55) even though a rather more extensive set of sequences is used. A number of branches display low support values, indicating poor agreement between the multiple solutions produced by the Quartet puzzling algorithm (36). This is more clearly seen in the case of BGP, where most branches have values below 90% (support values below 50% were omitted, producing multifurcated branching). The shorter length of the BGP sequences and the greater degree of conservation may contribute to the low support values because they result in a weak phylogenetic signal. Despite these results, separation between BGP sequences of fish and land animals is strongly supported. The Quartet puzzling tree for MGP is much better defined, with only one support value lower than 50% and many greater than 90%. Clustering of the various species agrees exactly with the accepted phylogeny of vertebrates except for the clustering of H. didactylus MGP and S. aurata MGP. As described previously, the H. didactylus MGP sequence is known with a high degree of certitude, so the unexpected high similarity between the two sequences must come from random mutation noise obliterating the phylogenetic signal. Molecular trees obtained from the analysis of the BGP and MGP set of sequences are not species trees, and a number of evolutionary processes can introduce differences between a correctly estimated gene phylogeny and the correct species phylogeny. These processes are horizontal transfer, duplication and loss, and deep coalescence (56).
Overall phylogenetic analysis based exclusively on the Gla motif region of both MGP and BGP is able to differentiate the two proteins and the land/sea taxon but beyond that, all branching is very poorly defined. This is obviously because of the small number of aligned residues and high amount of sequence conservation. However, it must be stressed that the Gla motif region contains enough information to tell BGP and MGP apart and their sea/land taxa. When more structural information is available, it may be possible to correlate these differences with adaptation to the different environments.
Origin and Evolution of BGP and MGPIt is quite clear that BGP and MGP have a common origin and that they are more closely related to each other than to any other VKD protein. This is well supported by the data base searching, the exon patterns, and protein sequence alignments showing that 1) BGP and MGP genes share the same simple gene organization and protein structure and 2) the C-terminal domain signature is similar in MGP and BGP. It is also clear that BGP and MGP are nearly if not completely absent in most non-vertebrate taxa. Levels of sequence similarity between BGP and MGP are much higher than one would expect to occur by convergence, suggesting that this gene group originated through gene duplication followed by subsequent sequence divergence. This duplication (gene, chromosome, or genome duplication) probably occurred very early in vertebrate evolution, being almost certainly an ancient event. Strong evidence for whole genome duplication has been shown for vertebrates (57). After the divergence of the cephalochordates from the chordate line, a genomic duplication occurred before jawless fish evolved around 500 Myr ago (Fig. 7). Another genomic duplication event probably led to the evolution of jawed fish around 400 Myr ago (Fig. 7). These two duplications led to the development of many distinctive vertebrate features, such as cartilage and bone. We hypothesize that the first genome duplication (before the branching of jawless fish) originated the ancestor gene of MGP, and the second genome duplication (before the branching of cartilaginous fish) would have produced the BGP ancestor gene. The BGP (380 Myr ago) and MGP (480 Myr ago) emergence times estimated from the evolutionary rate agree well with this conjecture. The appearance of MGP would be followed by cartilage formation and that of BGP by bone formation. After duplication, gene duplicates (BGP in this case) often experience relaxed evolutionary constraints. This promotes functional diversification of duplicates and biochemical innovation through mutations and recombination. In other words, duplicates evolve to acquire new functions. Several more speculative lines of evidence for BGP being a duplicate of MGP have been collected: 1) the presence of a MGP-like immunoreactive protein has been observed in lamprey (the more ancient species tested), whereas BGP was not detected7; 2) MGP is associated with cartilage, which appeared with the first vertebrates, whereas BGP is only associated with bone, a structure that appeared later in evolution; and 3) BGP seems to be better conserved than MGP.
Although the existence of BGP and MGP paralogues and/or pseudogenes cannot be ruled out, the absence of genes identified during both our survey and our various cloning events in different species that could have resulted from vertebrate genome duplication tends to confirm this hypothesis. The only exception identified so far, the mouse BGP gene cluster (45), is likely to be a recent duplication event after the rodent branching.
ConclusionsOur results support the hypothesis that: 1) bone and matrix Gla protein genes originated from two genome duplications that occurred around 500 and 400 Myr ago before jawless and jawed fish evolved, respectively; 2) cartilage and bone tissues arose concomitantly with the appearance of MGP and BGP, respectively; and 3) BGP is a duplicate of MGP. Finally we propose highly specific pattern definitions for the BGP and MGP Gla domain that should help to reliably identify bone and matrix Gla proteins in the growing family of non-annotated sequences originating from the numerous sequencing projects.
 |
FOOTNOTES
|
|---|
* This work was supported in part by Portuguese Science and Technology Foundation Grants PRAXIS/BIA/11159/98, POCTI/34668/Fis/2000, and POCTI/BCI/48748/2002, the latter two including funds from Fundo Europeu de Desenvolvimento Regional and Orçamento do Estado. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
The nucleotide sequences reported in this paper have been submitted to the DDBJ/GenBankTM/EBI Data Bank with accession numbers AF525316
[GenBank]
, AY383483
[GenBank]
, AY150038
[GenBank]
, AF526377
[GenBank]
, AY178836
[GenBank]
, AY239015
[GenBank]
, AF144707
[GenBank]
, AY182238
[GenBank]
, AY233378
[GenBank]
, AY182239
[GenBank]
, AY112747
[GenBank]
, AF479081
[GenBank]
, AY298910
[GenBank]
, AY294644
[GenBank]
. 
Supported by Portuguese Science and Technology Foundation Postdoctoral Fellowship BPD/1607/2000. To whom correspondence should be addressed: Centro de Ciências do Mar, Universidade do Algarve, Campus de Gambelas, 8005-139 Faro, Portugal. Tel.: 351-289800971; Fax: 351-289818353; E-mail: vlaize{at}ualg.pt.
1 The abbreviations used are: BGP, bone Gla protein; MGP, matrix Gla protein; VKD, vitamin K-dependent; EST, expressed sequence tag; aa, amino acid; Myr, million years; PAM, point accepted mutations; GGCX, -glutamyl carboxylase. 
2 C. S. B. Viegas, V. Laizé, and M. L. Cancela, unpublished results. 
3 P. A. Price and M. K. Williamson, unpublished data. 
4 P. A. Price and M. K. Williamson, unpublished data. 
5 P. A. Price personal communication. 
6 P. A. Price, unpublished laboratory data. 
7 D. Simes, personal communication. 
 |
ACKNOWLEDGMENTS
|
|---|
We thank T. R. Gregory from the American Museum of Natural History, New York, for the measurement of S. aurata genome size.
 |
REFERENCES
|
|---|
- King, K. (1978) Biochim. Biophys. Acta 542, 542546[Medline]
[Order article via Infotrieve]
- Walker, C. S., Shetty, R. P., Clark, K., Kazuko, S. G., Letsou, A., Olivera, B. M., and Bandyopadhyay, P. K. (2001) J. Biol. Chem. 276, 77697774[Abstract/Free Full Text]
- Stanley, T., Stafford, D., Olivera, B., and Bandyopadhyay, P. (1997) FEBS Lett. 407, 8588[CrossRef][Medline]
[Order article via Infotrieve]
- Craig, A. G., Bandyopadhyay, P., and Olivera, B. M. (1999) Eur. J. Biochem. 264, 271275[Medline]
[Order article via Infotrieve]
- Cancela, M. L., Williamson, M. K., and Price, P. A. (1995) Int. J. Pept. Protein Res. 46, 419423[Medline]
[Order article via Infotrieve]
- Rice, J. S., Williamson, M. K., and Price, P. A. (1994) J. Bone Miner. Res. 9, 567576[Medline]
[Order article via Infotrieve]
- Pinto, J. P., Ohresser, M. C. P., and Cancela, M. L. (2001) Gene 270, 7791[CrossRef][Medline]
[Order article via Infotrieve]
- Proudfoot, D., Skepper, J. N., Shanahan, C. M., and Weissberg, P. L. (1998) Arterioscler. Thromb. Vasc. Biol. 18, 379388[Abstract/Free Full Text]
- Yagami, K., Suh, J. Y., Enomoto-Iwamoto, M., Koyama, E., Abrams, W. R., Shapiro, I. M., Pacifici, M., and Iwamoto, M. (1999) J. Cell Biol. 147, 10971108[Abstract/Free Full Text]
- Fraser, J. D., and Price, P. A. (1988) J. Biol. Chem. 263, 1103311036[Abstract/Free Full Text]
- Pinto, J. P., Conceicao, N., Gavaia, P. J., and Cancela, M. L. (2003) Bone 32, 201210[Medline]
[Order article via Infotrieve]
- Simes, D. C., Williamson, M. K., Ortiz-Delgado, J. B., Viegas, C. S. B., Price, P. A., and Cancela, M. L. (2003) J. Bone Miner. Res. 18, 244259[CrossRef][Medline]
[Order article via Infotrieve]
- Cancela, M. L., Ohresser, M. C. P., Reia, J. P., Viegas, C. S. B., Williamson, M. K., and Price, P. A. (2001) J. Bone Miner. Res. 16, 16111621[CrossRef][Medline]
[Order article via Infotrieve]
- Ortiz-Delgado, J. B., Simes, D. C., Gavaia, P. J., Sarasquete, C., and Cancela, M. L. (2005) Histochem. Cell Biol., in press
- Luo, G., D'Souza, R., Hogue, D., and Karsenty, G. (1995) J. Bone Miner. Res. 10, 325334[Medline]
[Order article via Infotrieve]
- Luo, G., Ducy, P., McKee, M. D., Pinero, G. J., Loyer, E., Behringer, R. R., and Karsenty, G. (1997) Nature 386, 7881[CrossRef][Medline]
[Order article via Infotrieve]
- Munroe, P. B., Olgunturk, R. O., Fryns, J. P., Van Maldergem, L., Ziereisen, F., Yuksel, B., Gardiner, R. M., and Chung, E. (1999) Nat. Genet. 21, 142144[CrossRef][Medline]
[Order article via Infotrieve]
- Meier, M., Weng, L. P., Alexandrakis, E., Ruschoff, J., and Goeckenjan, G. (2001) Eur. Respir. J. 17, 566569[Abstract/Free Full Text]
- Schinke, T., and Karsenty, G. (2000) Nephrol. Dial. Transplant. 15, 12721274[Free Full Text]
- Fraser, J. D., and Price, P. A. (1990) Calcif. Tissue Int. 46, 270279[Medline]
[Order article via Infotrieve]
- Kirfel, J., Kelter, M., Cancela, L. M., Price, P. A., and Schüle, R. (1997) Proc. Natl. Acad. Sci. U. S. A. 94, 22272232[Abstract/Free Full Text]
- Cancela, M. L., and Price, P. A. (1992) Endocrinology 130, 102108[Abstract/Free Full Text]
- Farzaneh-Far, A., Proudfoot, D., Weissberg, P. L., and Shanahan, C. M. (2000) Biochem. Biophys. Res. Commun. 277, 736740[CrossRef][Medline]
[Order article via Infotrieve]
- Conceição, N., Henriques, N. M., Ohresser, M. C. P., Hublitz, P., Schüle, R., and Cancela, M. L. (2002) Eur. J. Biochem. 269, 19471956[Medline]
[Order article via Infotrieve]
- Cancela, M. L., Hu, B., and Price, P. A. (1997) J. Cell. Physiol. 171, 125134[CrossRef][Medline]
[Order article via Infotrieve]
- Ducy, P., Desbois, C., Boyce, B., Pinero, G., Story, B., Dunstan, C., Smith, E., Bonadio, J., Goldstein, S., Gundberg, C., Bradley, A., and Karsenty, G. (1996) Nature 382, 448452[CrossRef][Medline]
[Order article via Infotrieve]
- Boskey, A. L., Gadaleta, S., Gundberg, C., Doty, S. B., Ducy, P., and Karsenty, G. (1998) Bone 23, 187196[Medline]
[Order article via Infotrieve]
- Dowd, T., Rosen, J., Li, L., and Gundberg, C. (2003) Biochemistry 42, 77697779[CrossRef][Medline]
[Order article via Infotrieve]
- Hoang, Q., Sicheri, F., Howard, A., and Yang, D. (2003) Nature 425, 977980[CrossRef][Medline]
[Order article via Infotrieve]
- Frazão, C., Simes, D. C., Coelho, R., Alves, D., Williamson, M. K., Price, P. A., Cancela, M. L., and Carrondo, M. A. (2005) Biochemistry 44, 12341242[CrossRef][Medline]
[Order article via Infotrieve]
- Murshed, M., Schinke, T., McKee, M. D., and Karsenty, G. (2004) J. Cell Biol. 165, 625630[Abstract/Free Full Text]
- Thompson, J. D., Gibson, T. J., Plewniak, F., Jeanmougin, F., and Higgins, D. G. (1997) Nucleic Acids Res. 25, 48764882[Abstract/Free Full Text]
- Hardie, D. C., Gregory, T. R., and Hebert, P. D. N. (2002) J. Histochem. Cytochem. 50, 735749[Abstract/Free Full Text]
- Notredame, C., Higgins, D. G., and Heringa, J. (2000) J. Mol. Biol. 302, 205217[CrossRef][Medline]
[Order article via Infotrieve]
- Schneider, T. D., and Stephens, R. M. (1990) Nucleic Acids Res. 18, 60976100[Abstract/Free Full Text]
- Schmidt, H. A., Strimmer, K., Vingron, M., and von Haeseler, A. (2002) Bioinformatics 18, 502504[Abstract/Free Full Text]
- Felsenstein, J. (1989) Cladistics 5, 164166
- Page, R. D. (1996) Comput. Appl. Biosci. 12, 357358[Free Full Text]
- Kumar, S., and Nei, M. (2000) Molecular Evolution and Phylogenetics, pp. 2425, Oxford University Press, New York
- Grishin, N. V. (1995) J. Mol. Evol. 41, 675679
- Feng, D. F., Cho, G., and Doolittle, R. F. (1997) Proc. Natl. Acad. Sci. U. S. A. 94, 1302813033[Abstract/Free Full Text]
- Gu, X., Wang, Y. F., and Gu, J. Y. (2002) Nat. Genet. 31, 205209[CrossRef][Medline]
[Order article via Infotrieve]
- Kumar, S., and Hedges, S. B. (1998) Nature 392, 917920[CrossRef][Medline]
[Order article via Infotrieve]
- Tajima, F. (1993) Genetics 135, 599607[Abstract]
- Desbois, C., Hogue, D. A., and Karsenty, G. (1994) J. Biol. Chem. 269, 11831190[Abstract/Free Full Text]
- Patthy, L. (1987) FEBS Lett. 214, 17[CrossRef][Medline]
[Order article via Infotrieve]
- Zhou, A., Webb, G., Zhu, X., and Steiner, D. F. (1999) J. Biol. Chem. 274, 2074520748[Free Full Text]
- Price, P. A., Rice, J. S., and Williamson, M. K. (1994) Protein Sci. 3, 822830[Medline]
[Order article via Infotrieve]
- Creighton, T. E. (1993) Proteins: Structures and Molecular Properties, pp. 121123, W. H. Freeman, New York
- Volk, T. (1992) Development 116, 721730[Abstract]
- Li, T., Yang, C. T., Jin, D. Y., and Stafford, D. W. (2000) J. Biol. Chem. 275, 1829118296[Abstract/Free Full Text]
- Hale, J., Williamson, M., and Price, P. (1991) J. Biol. Chem. 266, 2114521149[Abstract/Free Full Text]
- Price, P. A., Fraser, J. D., and Metz-Virca, G. (1987) Proc. Natl. Acad. Sci. U. S. A. 84, 83358339[Abstract/Free Full Text]
- Wajih, N., Borras, T., Xue, W., Hutson, S. M., and Wallin, R. (2004) J. Biol. Chem. 279, 4305243060[Abstract/Free Full Text]
- Viegas, C. S. B., Pinto, J. P., Conceição, N., Simes, D. C., and Cancela, M. L. (2002) Gene 289, 97107[CrossRef][Medline]
[Order article via Infotrieve]
- Cotton, J., and Page, R. (2002) Proc. R. Soc. Lond. B Biol. Sci. 269, 15551561[Medline]
[Order article via Infotrieve]
- Postlethwait, J., Yan, Y., Gates, M., Horne, S., Amores, A., Brownlie, A., Donovan, A., Egan, E., Force, A., Gong, Z., Goutel, C., Fritz, A., Kelsh, R., Knapik, E., Liao, E., Paw, B., Ransom, D., Singer, A., Thomson, M., Abduljabbar, T., Yelick, P., Beier, D., Joly, J., Larhammar, D., and Rosa, F. (1998) Nat. Genet. 18, 345349[CrossRef][Medline]
[Order article via Infotrieve]
- Lamatsch, D. K., Steinlein, C., Schmid, M., and Schartl, M. (2000) Cytometry 39, 9195[CrossRef][Medline]
[Order article via Infotrieve]
- Nishimoto, S. K., Waite, J. H., Nishimoto, M., and Kriwacki, R. W. (2003) J. Biol. Chem. 278, 1184311848[Abstract/Free Full Text]

CiteULike Complore Connotea Del.icio.us Digg Reddit Technorati What's this?
This article has been cited by other articles:

|
 |

|
 |
 
A. Wargelius, P. G. Fjelldal, U. Nordgarden, and T. Hansen
Continuous light affects mineralization and delays osteoid incorporation in vertebral bone of Atlantic salmon (Salmo salar L.)
J. Exp. Biol.,
March 1, 2009;
212(5):
656 - 661.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
C. S. B. Viegas, D. C. Simes, V. Laize, M. K. Williamson, P. A. Price, and M. L. Cancela
Gla-rich Protein (GRP), A New Vitamin K-dependent Protein Identified from Sturgeon Cartilage and Highly Conserved in Vertebrates
J. Biol. Chem.,
December 26, 2008;
283(52):
36655 - 36664.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
V. Laize, C. S. B. Viegas, P. A. Price, and M. L. Cancela
Identification of an Osteocalcin Isoform in Fish with a Large Acidic Prodomain
J. Biol. Chem.,
June 2, 2006;
281(22):
15037 - 15043.
[Abstract]
[Full Text]
[PDF]
|
 |
|
Copyright © 2005 by the American Society for Biochemistry and Molecular Biology.
|
Advertisement
Advertisement
|