Molecular Evolution of the Tissue-nonspecific Alkaline Phosphatase Allows Prediction and Validation of Missense Mutations Responsible for Hypophosphatasia*

Background: We used evolutionary analysis of alkaline phosphatase (tissue nonspecific alkaline phosphatase, TNSALP) to predict missense mutations leading to hypophosphatasia. Results: We found 469 sensitive positions and validated 99% of the 204 known mutations. Conclusion: It is a more powerful method than in silico models to validate missense mutations in TNSALP. Significance: Such an approach should be widely used to support genetic diagnostics. ALPL encodes the tissue nonspecific alkaline phosphatase (TNSALP), which removes phosphate groups from various substrates. Its function is essential for bone and tooth mineralization. In humans, ALPL mutations lead to hypophosphatasia, a genetic disorder characterized by defective bone and/or tooth mineralization. To date, 275 ALPL mutations have been reported to cause hypophosphatasia, of which 204 were simple missense mutations. Molecular evolutionary analysis has proved to be an efficient method to highlight residues important for the protein function and to predict or validate sensitive positions for genetic disease. Here we analyzed 58 mammalian TNSALP to identify amino acids unchanged, or only substituted by residues sharing similar properties, through 220 millions years of mammalian evolution. We found 469 sensitive positions of the 524 residues of human TNSALP, which indicates a highly constrained protein. Any substitution occurring at one of these positions is predicted to lead to hypophosphatasia. We tested the 204 missense mutations resulting in hypophosphatasia against our predictive chart, and validated 99% of them. Most sensitive positions were located in functionally important regions of TNSALP (active site, homodimeric interface, crown domain, calcium site, …). However, some important positions are located in regions, the structure and/or biological function of which are still unknown. Our chart of sensitive positions in human TNSALP (i) enables to validate or invalidate at low cost any ALPL mutation, which would be suspected to be responsible for hypophosphatasia, by contrast with time consuming and expensive functional tests, and (ii) displays higher predictive power than in silico models of prediction.

heat stability, its allosteric behavior, and its interaction with the extracellular matrix, particularly collagen (18 -20).
Hypophosphatasia is an autosomal disease caused by homozygous, heterozygous, or compound heterozygous ALPL mutations and can be either dominant or recessive. The phenotype is principally characterized by disorders in bone and tooth mineralization and clinical symptoms vary from moderate (as in the case of rickets, osteomalacia, or dental defaults) to severe when deep hypomineralization of the skeleton occurs. Although this disease displays a low prevalence (1/300,000 in Europe), more than 275 mutations are reported in the literature to date (December 2013) (23) and of these 75% (204) result from amino acid substitutions (missense mutations).
The large range of genotypes and phenotypes associated with this disease makes it difficult to (i) establish genotype-phenotype relationships and (ii) predict the mutation severity because most of the patients have a unique genotype. Therefore developing tools for predicting and validating missense mutations leading to hypophosphatasia is crucial. Furthermore, it is critical that these tools are based on a deep understanding of the protein, particularly the importance of its functional and structural domains. Herein, we show that evolutionary analysis represents an extremely promising method for achieving this and we anticipate that it will have a significant impact in this field.
The increasing number of ALPL sequences found in public databases (NCBI and Ensembl) coming from genome and/or transcriptome sequencing in numerous vertebrate species enabled us to develop a new method for predicting disease mutations based on the evolutionary molecular analysis of protein genes (24,25). This method involves comparing a large number of primary sequences of a given protein arranged into a phylogenetic context (i.e. based on species relationships) with the aim of identifying sensitive positions. Such evolutionary analysis allows us to (i) understand how the protein and its various regions evolved; (ii) identify regions, motifs, and residues that play important functions because they were conserved during long geological periods (hundreds of millions years: Ma); and (iii) predict and/or validate point mutations that could lead to a genetic disease, and ultimately estimate their degree of severity. Such a method, which uses evolutionary analysis to predict human diseases, was recently called phylomedicine (25). It was shown that at least 95% of the amino acid substitutions already known to be responsible for a genetic disease are validated with this method (24). Recently, our group successfully used this approach for predicting sensitive positions in various proteins of the secretory calcium-binding phosphoprotein family (26), and showed that missense mutations lead to genetic diseases. This approach was demonstrated on the following proteins: amelogenin (27), enamelin (28), matrix extracellular phospho-glycoprotein (29), amelotin (30), dentin matrix protein 1 (31), and ameloblastin. 6 Here, we perform the evolutionary molecular analysis of mammalian TNSALP to (i) highlight the functional or structural importance of various positions and domains; (ii) predict sensitive positions that should be responsible for a genetic disorder when substituted; and (iii) validate our predictions by using missense mutations reported in humans.

ALPL Sequences and Alignment
A total of 58 ALPL sequences, representative of the main mammalian lineages, were extracted from NCBI and Ensembl databases. They comprised nine published, full-length cDNA sequences, 14 computer-predicted sequences available from sequenced mammalian genomes, and 35 sequences obtained using Basic Local Alignment Search Tool (BLAST) in genomes being currently sequenced ( Table 1). The unpublished sequences were validated through alignment to the published cDNA sequences using Se-Al v.2.0a11 software (32). The intronexon boundaries were carefully checked. When necessary, the sequences were completed and/or corrected either using Blast search in the Ensembl genome database, in the NCBI trace archives, and in the whole genome shotgun repository sequences, or using de novo sequencing using PCR on genomic DNA extracted from ethanol-preserved tissues (see Table 2 for primers used). The 58 nucleotides sequences were translated into amino acid sequences and aligned to human TNSALP chosen as reference sequence in our alignment. The 58 coding ALPL sequences are available online.
Our final alignment consisted of 528 positions, including only four insertions (supplemental Table S1). A few residues were missing in this alignment (395 nucleotides), representing less than 1.3% of the data, and the corresponding positions were treated as "unknown data."

Evolutionary Analyses
Substitution Models-The substitution models for both the nucleotide and amino acid alignments were defined using an online model selection tool datamonkey (33). Briefly, for each alignment, the software defines the best model by testing all possible evolutionary models. Here, the two best substitution models were (i) the Tamura-Nei model (34) for the nucleotide alignment; and (ii) the Jones et al. model (35) for amino acid alignment.
Distance Tree-This approach is useful for the evolutionary analysis as it highlights variable sequences related to either large evolutionary distances or high substitution rates with regard to amino acid conservation through mammalian evolution. The 58 aligned mammalian ALPL sequences were computed using Molecular Evolutionary Genetics Analysis 5.2.1 (MEGA5) (36). The alignment was restricted to the length of the human ALPL sequence by excluding the 12-nucleotide insertions in non-human sequences and the stop codon. This procedure resulted in a dataset containing 1572 nucleotide sites 6 F. Delsuc, B. Gasse, and J.-Y. Sire personal data. TNSALP  TABLE 1 Preferred common names, scientific names, families, orders, and references in GenBank TM for the 58 mammalian species, from which ALPL sequences were used in our study

Evolutionary Analysis of Sensitive Positions in
The species are arranged in alphabetical order of common names.   (524 amino acids). This alignment was supplemented by a mammalian tree taking into account the recent mammalian phylogeny (37). Maximum likelihood phylogenetic reconstruction was performed from the nucleotide alignment using the HYpothesis testing using PHYlogenies (HYPHY) software (38). For each nucleotide, the probability for the observed data were calculated using the maximum likelihood method and taking into account the supplied tree.
Analysis of Functional Constraints-As previously described (29 -31), the strong constraints corresponding to structural/ functional regions were searched on the nucleotide and amino acid alignments by Sliding Windows analysis in the HYPHY software and by SLAC analysis (datamonkey) (39,40). For the Sliding Windows analysis, the logarithm of the probability was calculated for a window of 15 base pairs (bp) with an overlap of 5 bp between each window and using a likelihood algorithm taking into account phylogenetic relationships.
Analysis of Amino Acid Substitutions-Site-specific selections (i.e. biologically significant amino acids) in TNSALP were identified in our alignment for the 524 positions and displayed on the human sequence. The analysis was performed according to the substitution preferences of amino acids, i.e. favoring property conservation (41) ( Table 3). We defined three levels of selection: conserved (i.e. unchanged residues during 220 Ma), conservative (i.e. substituted residues having similar properties), and variable positions (i.e. substitution with various residues).
Prediction of Post-translational Modifications-We looked for putative N-glycosylation and phosphorylation sites (casein kinase II, protein kinase C, and tyrosine kinase phosphorylation sites) in human and some mammalian sequences using MyHit database). We calculated the theoretical pI (isoelectric point) of different motifs with the Compute pI/Mw tool (Expasy).

Sequence Comparison of Mammalian TNSALP
With a total of 58 species from 22 orders, our dataset is well representative of the current mammalian TNSALP diversity and it covers circa 220 Ma of evolution ( Table 1). The length of the coding ALPL sequence, from ATG to the stop codon, was found identical in 53 species: 1575 bp resulting in 524 amino acids from the N-terminal methionine, Met 1 , found in the region encoded by exon 2, to the phenylalanine preceding the stop codon, Phe 524 , encoded by exon 12 (supplemental Table S1). Short variations were only observed in five TNSALP. In the region encoded by exon 2, one or two amino acids were deleted before the cleavage site of the signal peptide in Marsupiala (wallaby, Tasmanian devil, and opossum); encoded by exon 3, two deletions and one insertion are found in an Afrotheria (hyrax); encoded by exon 11, one deletion in a Chiroptera (megabat); and encoded by exon 12, one deletion in opossum and wallaby, and three insertions in Marsupiala. Aside these variations, 303 residues were found unchanged in TNSALP during 220 Ma of evolution (supplemental Table S1).

Distance Tree
To define whether all sequences were relevant for our evolutionary analysis, we calculated the pairwise distance for each sequence in the alignment. This method allowed to quantify the evolutionary distance for each taxon. This distance was shown on a maximum likelihood tree (Fig. 1). First of all, we observed that the substitution rates were low for TNSALP (Յ0.11) compared with unordered proteins previously studied in our group (DMP1, AMTN, and MEPE), which display larger branch lengths indicating rapidly diverging sequences (for instance, 0.332 and 0.602 for hedgehog and platypus DMP1, respectively). Substitution rates of TNSALP were particularly low in primates, as illustrated by short branches, e.g. from 0.00365 between human and chimpanzee to 0.01199 between marmoset and squirrel monkey, except for bushbaby (0.07601). Other sequences have higher substitution rates as shown by the long branchs of platypus (0.11082), opossum (0.08267), Tasmanian devil (0.07516), and shrew (0.06991), but taken together the low substitution rates demonstrated that all sequences were relevant for the evolutionary analysis and that TNSALP was subjected to a high selection pressure in all species.

neutral, and disfavored amino acid substitutions in proteins
The order of amino acids shows the substitution preferences, from the favoured substitutions to the most disfavored. From Ref. 41.

Analysis of Functional Constraints
Sliding Window analysis indicated that the TNSALP sequence is characterized by an alternance of high and low values of maximum likelihood, reflecting weak and strong selective pressures, respectively ( Fig. 2A). The non-synonymous substitution rate (dN) obtained with SLAC analysis confirmed these observations and allowed to identify various domains more precisely (Fig. 2B). The region encoded by the 3Ј extremity of exon 2, corresponding to the signal peptide cleavage site, is well conserved, indicating a strong functional pressure. The regions encoded by the 3Ј and 5Ј extremities of consecutive exons are also strongly constrained, especially exons 3-4, 4 -5, 9 -10, and 11-12. The analysis confirms that several, already known, functional motifs are subjected to selective pressure: the homodimeric interfaces principally those encoded by exons 4 and 5, and the active sites encoded by exons 5, 6, 10, and 12.

Putative Post-translational Modifications
In human TNSALP MyHit confirmed the presence of the five putative N-glycosylation sites (motif ϭ NX(S/T/C)) already reported in the literature ( 140 NXT 142 , 230 NXT 232 , 271 NXT 273 , 303 NXT 305 , and 430 NXS 432 ) (Fig. 3). Although these sites are generally well conserved through 220 Ma, our evolutionary analysis revealed that some mammalian TNSALP are lacking correct components of the site: 140 (Fig. 3). Our analysis indicated that five were conserved during mammalian evolution. They are: three casein kinase II ( 188 SDNE 191 , 245 TRLD 248 , and 308 SLSE 311 ) and two protein kinase C ( 166 TTR 168 and 389 TPR 391 ) phosphorylation sites. MyHit did not identify any putative O-glycosylation site.

Analysis of Amino Acid Substitutions
Of the 524 amino acids composing the human TNSALP sequence, 469 (89.5%) were identified as sensitive positions, i.e. that were either conserved (i.e. unchanged residues) (303 residues, 57.8%) or conservative (i.e. substituted with residues having similar properties) (166 residues, 31.7%) during 220 Ma of mammalian evolution. Only 55 positions (10.5%) were identified as variable (i.e. substituted with various residues). These unchanged, conservative, and variable positions were reported on the human sequence, which constitutes the chart of sensitive positions of human TNSALP (Fig. 3). We predict that any substitution occurring at one of the 303 unchanged positions or any substitution of one of the 166 conservative positions with a residue having a different property would disrupt TNSALP function and lead to hypophosphatasia.
All substitutions on conservative and variable positions found in the 58 TNSALP sequences were reported in the Online Resource 4. Our analysis confirms that the protein is strongly constrained and indicates that each of the 469 sensitive positions plays an important role of either structural or functional significance.
It is worth noting that the percentage of purifying selection is high along the protein sequence, and varies from 80% in the region encoded by exon 2 to 96.6% in the region encoded by exon 6. More precisely, our evolutionary analysis (i) confirmed already known important sites and domains of the protein and (ii) highlighted new sites having putative important roles (Fig. 3).

Confirmation of Important Sites and Domains
The residues and motifs known to play an important role for the right function of TNSALP are briefly reviewed below, along the protein sequence, at the light of our evolutionary analysis. See "Discussion" for a more accurate description of the conservation of these important positions.
The In addition to these conserved positions, two previously identified large regions encoded by several exons, the Ca 2ϩ -binding domain and the crown domain, are also well conserved (Fig. 3).

Other Sites
In addition to the important regions confirmed above, various residues of unknown function are well conserved in numerous regions, and are located far from the active sites. Three motifs seem to play an important role: 126 GTVGVSAA 133 and 144 GNEVTSILRWA 154 , encoded by exon 5, and 348 ALHEAVEMDRAI 359 , encoded by exon 10. Three-dimensional modeling shows that these three regions are located at the surface of the molecule (Fig. 4: see "Discussion").

Validation of ALPL Mutations in Humans
Our Predictive Evolutionary Chart-So far, of the 204 missense mutations listed in the ALPL mutation database, 145 positions only are affected as more than one amino acid substitution can occur in a same position (supplemental Table S2). These mutated positions were indicated on the human TNSALP sequence and compared with the sensitive positions identified in our evolutionary analysis (Fig. 3). Among the 145 positions affected in humans, 113 concerned unchanged positions and 31 conservative positions were identified as sensitive positions in our evolutionary analysis, and one variable position. All human mutations on a conserved position and 8 on a conservative position were directly validated by our evolutionary analysis, which represents 82.8% of the affected positions in humans. In 23 of 24 conservative positions, the substituting amino acids were never found in the nature (supplemental Table S2). Although our evolutionary analysis could not predict initially such substitutions as being deleterious, the fact that Regions and residues are subjected to functional constraints (i.e. high selective pressure) when (i) Ln likelihood is close to 0, and (ii) the rate of non-synonymous substitution (i.e. changing the residue) is low. they were not shared by any mammalian TNSALP validates them secondarily. The same reasoning can be applied to the only variable position Asp 406 , in which two deleterious substitutions were found in humans (p.D406N and p.D406G). Indeed, the two residues substituted were not found in the wild (supplemental Table S2). Therefore, our evolutionary analysis validated 99.3% of the deleterious missense mutations observed in human TNSALP.
Eventually, only one missense mutation (0.7%) did not fit with our evolutionary analysis. This mutation was reported on a conservative position (p.A177T), but was not predicted as deleterious because the same substitution occurred in the wild (supplemental Table S2).
Other in Silico Prediction Tools-To date, several in silico models dedicated for predicting the effects of amino acid substitutions in humans are available and easy to access (for review, see Ref. 42). We list the results obtained with TNSALP using five softwares (see Table 4 and supplemental Table S3). To predict the effect of the substitutions, there are three categories of tools: the sequence and evolutionary conservation-based methods, such as SIFT or Mutation Assessor, which base their prediction on sequence conservation using a multiple sequence alignment but this approach is dependent to the multiple sequence alignment provided; the protein sequence and structure-based methods, such as PolyPhen2 or MAPP, which combine amino acid properties, location of functional site, secondary structure, and membrane topology; the supervised-learning methods like SNAP, which include for their algorithms neural networks, support vector machines and random forests, and naive Bayes classifiers. The predicted mutations are considered as true positives, the nonpredicted mutations as false negatives, and the false predictions as false positives.
PolyPhen-2 is software based on both sequence and structural information, and classifying mutations as deleterious or benign. This tool provided a correct prediction, for 93.6% of the missense mutations known in ALPL, but erroneously considered 47 substituted positions, i.e. 9% of the 524 residues, with amino acids found in the nature as damaging the protein function.
MAPP (Multivariate Analysis of Protein Polymorphism) is software combining both evolution and physicochemical characteristics like hydropathy, polarity, charge, volume, free energy in ␣-helix, and ␤-strand conformations. This powerful software detects "good" and "bad" amino acids for each position. All known mutations of TNSALP were found except K264R and R450H. Nevertheless, MAPP also detected as bad 139 substituted positions (26.5%) with residues found in nature.
Mutation Assessor is designed to predict the functional impact of amino acid substitutions, through evolutionary conservation in the protein family and subfamily. This software provided a correct prediction for 82.8% of TNSALP missense mutations. A total of 109 moderate phenotypes were predicted by the software, but 10 of the 24 moderate phenotypes reported in the literature were not validated. Similarly, Mutation Assessor predicted 59 positions as leading to a severe phenotype when substituted, but did not validate seven of the severe phenotypes known in the literature. Also, Mutation Assessor wrongly detected 48 substituted positions (9.1%) with residues found in nature as leading to diseases.
SIFT (Sorting Intolerant From Tolerant) identifies deleterious mutations using only an evolutionary approach. This method is based on the degree of conservation of amino acids in sequence alignments derived from closely related sequences obtained by PSI-BLAST. This software did not detect 50% of    Table S2).

Evolutionary Analysis of Sensitive Positions in TNSALP
the mutations known in the literature and erroneously predicted 30 amino acids (5.7%) as nontolerated although they are found in nature (e.g. K27W, N53K, . . . ). SNAP (Screening for Non-acceptable Polymorphisms) is a neural network using derived protein information as, for instance, secondary structure, evolutionary conservation, and solvent accessibility to predict the effect of mutations on the protein. This tool validated 69% of the missense mutations reported in the literature for TNSALP, but performed false prediction for 18 positions (3.4%) in considering residues as nonneutral for the protein function, whereas there are observed in nature. These predictions and validations are commented upon under "Discussion."

DISCUSSION
The evolutionary molecular analysis of mammalian TNSALP yields important new information by (i) pointing precisely to residues and motifs of crucial structural and/or functional importance through evidence of selective constraints; (ii) predicting which positions could result in a genetic disorder when substituted; and (iii) validating all but one missense mutation reported so far in humans. In addition, in conservative and variable positions, confronting deleterious missense mutations in humans and in other mammals allowed to detail the range of possible substitutions at these positions in TNSALP.
Selective Constraints and Evidence of Functional Residues and Motifs-First, our study demonstrated that TNSALP is a highly constrained protein as illustrated by 89.5% sensitive positions, identified through 220 Ma of mammalian evolution. This high percentage of conservation during such a long geological period reflects strong selective constraints acting on the protein at the amino acid level. This also reveals that the protein needs to be highly structured on its full-length for a correct function of the enzyme. This high selective pressure along the whole sequence length contrasts with the low constraints observed in unstructured (disordered) proteins previously studied: AMELX (40% of sensitive positions (27)), ENAM (7% (28)), MEPE (14% (29)), AMTN (8% (30)), DMP1 (20% (31)), and AMBN (18%). 7 This corroborates the crucial importance of TNSALP in mammals and its involvement in various functional, but also structural, roles, not yet exhaustively identified.
Active Sites and Their Neighboring Residues-We found that the 14 previously identified active sites of TNSALP and their flanking residues were unchanged during 220 Ma, which confirms their crucial importance for the right function of the protein. More precisely, the 58 LGDGM 62 motif, including Asp 60 , which plays a role as ligand for metal ions, Zn 2ϩ (zinc site 2) and Mg 2ϩ (43); 107 VPDSAG 112 motif, with Ser 110 , which covalently binds phosphate during catalysis (44). The residue Tyr 117 , located close to this active site, is highly conserved. This amino acid is located in the 12-Å sphere surrounding the phosphate group; the 171 HATPS 175 motif, containing His 171 and Thr 173 , described as indirect ligand and ligand of Mg 2ϩ (magnesium site), respectively; 332 EGGRIDHGHH 341 motif, including two residues of the magnesium site, Glu 332 , which is a direct ligand, and His 338 , an indirect ligand, and two ligands of Zn 2ϩ (zinc site 1), Asp 337 and His 341 . In addition, in this motif, His 340 is involved in a hydrogen bond and plays an important role in stabilizing the active site environment (45); the 377 ADHSH 381 motif, in which Asp 378 , His 379 , and His 381 are occupied by a Zn 2ϩ (zinc site 2); and finally, the 451 HETHG 455 motif, housing His 454 , which is important for the zinc site 2. TNSALP also contains also a Ca 2ϩ -binding site, which is non-catalytic and possesses 76 residues (from Gly 222 to Leu 309 ), including Phe 290 , Glu 291 , and Asp 307 that were unchanged during evolution and bind calcium (17,46).
Peripheral Binding Site-The Ca 2ϩ -binding site also includes a peripheral binding site, which is constituted by residues located on two ␣-helices: Trp 270 -Leu 278 and Ser 308 -Gln 318 (47). Among them five residues (Trp 270 , Arg 272 , Leu 275 , Asp 306 , and Glu 311 ) were unchanged during mammalian evolution. This finding highlights their crucial importance for the structure/ function of the enzyme, although their precise role is still unknown.
Disulfide Bonds-The two disulfide bonds of TNSALP, Cys 139 -Cys 201 , and Cys 489 -Cys 497 , were unchanged, indicating their important role in the stability of the protein structure (45).
N-terminal Fragment-The domain Lys 27 -Glu 42 located in the N-terminal region forms an ␣-helix, which interacts with specific residues to stabilize the protein and allow catalysis (16). Of the 16 residues composing this domain, seven were unchanged: Tyr 28 , Trp 29 , Gln 32 , Gln 34 , Thr 36 , Leu 37 , and Leu 41 . This finding underlines the essential role that these amino acids play for TNSALP function and/or structure.
Crown Domain-The crown domain is an essential domain, exclusively found in mammalian TNSALP. This domain is composed of 65 amino acids (from Gly 387 to His 451 ), most of them being well conserved during evolution (only three variable positions), a finding that points to the important function of this domain. It includes two residues essential for their uncompetitive inhibition properties: Tyr 388 , which is unchanged, and His 451 , which is on a conservative position. The latter residue plays a role in substrate binding of the nearby active site (see above). Within this crown domain, the region Gly 425 -His 451 was shown to be a collagen binding site (20). Our evolutionary analysis confirmed the importance of this region and allowed to measure the precise length of this binding site in shortening the functional domain to 16 amino acids, Val 434 -Leu 449 . This motif is more acidic, with a pI of 5.08, allowing the electrostatic interaction with the N-telopeptide of the collagen fibrils. However, the high affinity of this region has to be tested in vitro.
Ionic Pocket-A highly ionic pocket was described as formed by six residues (Asp 109 , Glu 125 , Gly 126 , Arg 184 , Tyr 187 , and Tyr 388 ) (43). These positions are unchanged excepted for Glu 125 , which is substituted with His 125 in Tasmanian devil (Marsupiala). The ionic pocket is specific to TNSALP and these residues are described as constituting an hydrophobic pocket in other ALPs.
Homodimeric Interface-Le Du et al. (17) identified 83 amino acids involved in the homodimeric interface. Using the threedimensional model published by Mornet et al. (46) and our evolutionary analysis, we identified nine residues or motifs playing an important role in this interface. Indeed, of the 83 amino acids involved in the homodimeric interface, 92% are on sensitive positions (64% unchanged and 28% conservative), which allowed us to conclude that the homodimeric interface is highly constrained and plays an essential role for the correct function of the protein. This must be related to the observation that ALPs are active only in homodimeric form in all species, including prokaryotes (48).
Post-translational Modifications-Although the five putative N-glycosylation sites identified in human TNSALP were rather well conserved during mammalian evolution, slight differences observed in some species indicate that four of these sites were probably not functional in the last common mammalian ancestor; only 303 NXT 305 was kept unchanged during mammalian evolution. We can conclude that either the four other sites are not glycosylated or that glycosylation occurred late in mammalian evolution: in this case, 230 NXP 232 and 271 NXT 273 glycosylations could have occurred in an ancestral Placentalia, and 140 NXT 142 and 430 NXS 432 in an ancestral Primate. Nosjean et al. (22) showed that cleavage of N-glycosylations removed human TNSALP activity, with the exception of Asn 230 and Asn 303 . This finding seems to support that Asn 140 , Asn 271 , and Asn 430 glycosylations occurred recently in mammalian evolution and that Asn 230 is not glycosylated. In contrast the evolutionary conservation of Asn 303 could indicate either the presence of a N-glycosylation or another function for this site. Studying TNSALP in nonmammalian species could bring more accurate information on the story of these sites.
This study reports for the first time the presence of putative phosphorylation sites. Of 18 sites, five were found unchanged during mammalian evolution and could be functional, but their presence should be experimentally tested.
New Putatively Important Sites-Three highly conserved motifs were not currently reported in the literature as being functionally or structurally important for TNSALP (Fig. 4). 126 GTVGVSAA 133 (Fig. 4A) and 144 GNEVTSILRWA 154 (Fig.  4C) are both encoded by exon 5 (chain X), but in the first motif, the first three amino acids are located in the 12-Å sphere around the phosphate group (49). The third motif, 348 ALHEAVEMDRAI 359 (Fig. 4B), is encoded by exon 10 (chain Y) and contains an acidic cluster, giving a pI of 4.65, more acidic than the collagen binding site. These three sequences are not referenced in the peptide database. Interestingly the three-dimensional model shows that these three motifs are at the surface of the molecule and very exposed, A and C on the front side and close enough to each other to form a unique domain, and B on the rear side (Fig. 4). A and C motifs are also very close to the N-terminal ␣ helix of the other monomer and probably contributes to the homodimer interface, whereas the role of motif B remains to be elucidated.
Predictions and Validations of Human Mutations-Our evolutionary molecular analysis of mammalian TNSALP highlights the functional and/or structural importance of 469 unchanged or conservative positions on the human sequence. All these positions are predicted to be sensitive, i.e. their change will result in hypophosphatasia. By contrast, three well known polymorphisms of the ALPL gene, p.R152H, p.Y263H, and p.V522A, were found in conservative (p.R152H) and variable (p.Y263H and p.V522A) positions, and the conservative histi-dine residue at position 152 was observed in various other species, supporting the benign effect of the change. Thus, this predictive chart of sensitive positions on the human TNSALP sequence could be a useful tool for clinicians when identifying a new missense mutation of ALPL in a patient.
When comparing the 145 positions known to be affected in human TNSALP (missense mutations) to our predictive chart (Fig. 3), 120 of these positions (82.75%) correspond to mutations that would have been directly predicted to be deleterious because they occurred on sensitive positions. Of the unpredicted 25 missense mutations, 23 are located on a conservative position that was substituted with a residue having similar chemical properties but was never found in nature. This means that (i) our table of residue substitution at each position in the 58 mammalian species (supplemental Table S2) validates the deleterious effect of substitutions observed in humans, increasing to 98.6% the percentage of prediction, and (ii) the tolerance for changes affecting conservative positions must be more accurately specified in terms of similar properties. Indeed, all residues sharing similar chemical properties cannot substitute the functional amino acid. Structural and/or three-dimensional properties should also be considered probably in relationship with the highly constrained structure of the protein.
The two last missense mutations of TNSALP that were not directly predicted by our chart concern (i) conservative position 177 that is similarly mutated in nature and (ii) the variable position 406. By definition, a variable position could be substituted by any residue without any deleterious effect. This is not the case for this variable position. In humans, two missense mutations are reported on this position, p.D406N and p.D406G, but these substitutions were never observed in the 58 species studied that are well representative of the mammalian lineages. In other words, if the aspartic acid at position 406 could be substituted by either asparagine or glycine without any deleterious effect, such a substitution would have occur at random in one of the studied species during 220 Ma of evolution. Here again our substitution table predicts deleterious substitutions, probably related to large differences in the properties of the substituted residues, and increases to 99.3% the percentage of prediction. The human mutation p.A177T on a conservative position was unpredictable as the substituting amino acid reported responsible for the disease in human (Thr) was also found in three mammalian species (Myotis lucifugus, Chrysochloris asiatica, and Echinops telfairi). The mutation was reported by Goseki-Sone et al. (50) in a patient with adult hypophosphatasia and the authors tested 111 unrelated genomic DNAs to exclude a polymorphism. It remains possible, however, that this mutation is in fact a very rare polymorphism in the human population.
Among the 204 missense mutations reported so far in the database of ALPL mutations responsible for hypophosphatasia, the clinical spectrum of patients is highly variable, ranging from the most severe form, presenting with stillbirth without mineralized bone, to the most moderate form presenting with loss of tooth without bone symptoms. On the basis of clinical reports, recurrent genotypes, homozygous genotypes, and site-directed mutagenesis experiments performed in our group or by others (see references in the ALPL mutation database) we were able to qualify 184 mutations as severe or moderate alleles. Eightyseven percent (140 of 160) of severe alleles were at conserved positions, 12% (19/160) at conservative positions, and one at a variable position (p.D406N), whereas 37% (9/24) of moderate alleles were found at conservative positions and 63% (15/24) at conserved positions (Table 5). Thus, although the evolutionary chart is efficient to distinguish polymorphisms and mutations, it remains relatively ineffective at predicting the severity of mutations. However, we can observe that five positions are subjected to various mutations and the functional consequences of such mutations are different: A116S has a moderate phenotype, whereas A116T has a severe one; R136C has a severe phenotype, whereas R136H and R136L have a moderate one; F327C and F327G have a severe phenotype, whereas F327L has a moderate phenotype; I395T has a severe phenotype, whereas I395V has a moderate phenotype; and R450C has a severe phenotype, whereas R450H has a moderate phenotype. According to the substitution preferences of amino acids, i.e. favoring property conservation (41) (Table 3), we observed that the most favored substitution leads to a moderate phenotype, whereas the most disfavored substitution leads to a severe phenotype. These findings confirm that the nature of the amino acid change plays a determining role in the final function of the enzyme at both conservative and conserved positions. These observations also suggest that it may be interesting to combine such sequence alignment results with algorithms predicting the effect of the change in terms of secondary structure, spatial constraint, polarity, and biochemical properties.
The five in silico models dedicated for predicting the effects of amino acid substitutions in humans were not as efficient predictive tools as our evolutionary chart, even if these predictive tools that try and use structural factors for predicting deleterious mutations are at a disadvantage because there are many ways a protein can adapt to a site mutation that cannot be known a priori (Table 4 and supplemental Table S3). For example, there are the compensated pathogenic deviations that are compensatory mutations found in a protein of one species cancelling the effect of disease-associated mutations in the ortholog protein in another species (1,2). These adaptations can explain why we observed seven residues (N53K, V67R, D233H, V331S, T384P, D406Y, and 465S) predicted as deleterious by the five in silico tools even if they are observed in other mammalian genomes. PolyPhen-2 and MAPP provided accurate prediction of sensitive positions in the TNSALP sequence (93.6 and 99%, respectively), but a number of amino acids were erroneously predicted as deleterious (9 and 26.5%, respectively). Indeed, we can deduce that the substitution is neutral for these residues being observed in nature. Mutation Assessor, SIFT, and SNAP were less efficient in providing accurate prediction of sensitive positions (82.8, 49.8, and 68.5%, respec-tively) than our evolutionary model and than the two softwares mentioned above, and they also misinterpreted neutral substitutions as being deleterious (9.1, 5.7, and 3.4%, respectively). Therefore, our evolutionary chart of sensitive positions of human TNSALP enables us to validate (or invalidate) at low cost any ALPL mutation, which would be suspected to be responsible for hypophosphatasia. We showed that five in silico models are not as efficient predictive tools as our predictive chart, which allowed us to validate more than 99% of missense mutations in ALPL currently known to result in hypophosphatasia, and did not provide false predictions in contrast to in silico predictive models. We also show that our approach is more efficient in distinguishing precisely causal mutations and polymorphisms, and in avoiding any false prediction (Table 4). Having access to an efficient predictive tool will become more and more necessitated with the dramatic increase of standard and high-throughput sequencing data.