The carboxyl-terminal domain of phosphophoryn contains unique extended triplet amino acid repeat sequences forming ordered carboxyl-phosphate interaction ridges that may be essential in the biomineralization process.

Phosphophoryns (PPs), a family of Asp and Ser(P)-rich dentin proteins, are considered to be archetypal regulators of several aspects of extracellular matrix (ECM) biomineralization. We have cloned a rat incisor PP gene, Dmp2, from our odontoblast cDNA library and localized it to mouse chromosome 5q21 within 2 centimorgans of Dmp1, another tooth-specific ECM protein. The carboxyl-terminal region of Dmp2 protein (60 residue % Ser, 31 residue % Asp) is divided into two domains, one with unique repetitive blocks of [DSS]n,3≤14, the other with [SD]m = 2,3. Conformational analysis shows the phosphorylated form of the [DS*S*]n repeats to have a unique structure with well defined ridges of phosphates and carboxyls available for counter ion binding. The [S*D]m domains have different phosphate and carboxylate interaction edges and thus different calcium ion and apatite surface binding properties. These two domains and the colocalization of Dmp1 and Dmp2 genes at a position equivalent to the dentinogenesis imperfecta type II location on human 4q21 all suggest that the PPs are indeed involved in some aspect of ECM mineralization.

The phosphophoryns (PPs) 1 (1) are a family of highly phosphorylated proteins with a unique composition and amino acid sequence. Found as the major noncollagenous proteins of the mineralized dentin matrix, the PPs have been considered to be the archetype of macromolecules that might regulate biomineralization processes (2,3) by binding to the matrix of structural proteins (4), nucleating mineralization (5,6), and controlling crystal growth (7). To understand how the PPs might carry out these diverse functions, it is crucial to know their amino acid sequences and structures. The PP from various species contain from 35 to 45 residue % aspartic acid and 40 to 55 residue % serine, of which as many as 90% of the Ser residues are phosphorylated (8,9). The difficulties of Edman sequencing of phosphoserine-rich proteins have limited the information attainable by conventional sequencing efforts (10 -12) to a few internal acidic domains and a short amino-terminal sequence. Recently, chemical sequencing using means to derivatize the phosphoserines has provided a longer NH 2 -terminal sequence (13). Sequencing via cloning of a PP cDNA has also been difficult. A putative PP cDNA was described (14) several years ago, but no sequence data have been reported. Nucleotide probes have generally been unsuccessful because of the high redundancy of the Ser codons (15). Antibody screening of cDNA libraries was not fruitful because of the difficulties of producing high titer PP antibodies (16,17).
We have recently taken a new approach to the production of an antibody to a highly purified, dephosphorylated phosphophoryn (dPP) and, as described below, have been successful in using that antibody to identify several cDNAs in our established rat incisor odontoblast cDNA library (15,18). Sequencing of the clones obtained thus far yielded the expected Asp-Ser-rich composition but, more importantly, revealed both an unexpected, unique "triplet cassette" repeat motif as well as a "doublet" repeat motif. These might account for several of the biomineralization regulatory functions of the phosphophoryns. We report here the nature of these unique sequence blocks. Furthermore, the identification of a PP cDNA clone has allowed the chromosomal localization of the corresponding PP gene. As shown below, the PP gene is colocalized with the gene for a mineralization disorder, dentinogenesis imperfecta, type II, correcting an erroneous assignment (14).

EXPERIMENTAL PROCEDURES
PP and Antibody Preparation-PP was extracted from rat incisors by the procedure of Rahima and Veis (9). The high M r PP components were isolated by CaCl 2 precipitation from the crude extract, followed by chromatography on a Bio-Gel DEAE 5-PW preparative column. Analytical gel electrophoresis showed the presence of three bands, with molecular mass 95, 90, and 85 kDa. Preparative gel electrophoresis followed by electroelution was used for isolation of the 95-kDa protein band. The PP 95kDa was dephosphorylated with potato acid phosphatase (19). The dephosphorylated PP 95K (dPP 95kDa ), isolated by a second round of electrophoresis followed by electroelution, was coupled to keyhole limpet hemocyanin (KLH) using carbodiimide. The KLH-dPP 95K conjugate was then injected into a goat. Booster injections followed at 16 and 25 days, and the immune serum was collected at day 42. The antibody was concentrated by ammonium sulfate precipitation and then passed over an affinity column of dPP 95kDa coupled to Sepharose 4B. The purified antibody was eluted.
Library Screening, Nucleotide Sequencing, Northern Analysis, and Tissue and Chromosomal Localization-Our gt11 rat odontoblast expression library (15) was amplified on LB/agar plates and screened using standard procedures. The translation product is expressed as a ␤-gal-cDNA fusion protein when the lacZ gene is induced with isopropyl-1-thio-␤-D-galactopyranoside. The fusion proteins were adsorbed to Hybond nylon membranes, washed for 30 min in 50 mM Tris-buffered saline (TBS), 0.05% Tween 20 (TTBS), drained, and blocked at 4°C in 4% casein solution. The stock anti-dPP 95K antibody was added at 1:100 dilution in TBS, 3% bovine serum albumin and reacted with the membrane overnight at 4°C. The membranes were washed with TTBS, drained, and incubated for 3 h with peroxidase-conjugated rabbit antigoat IgG at 1:1000 in TBS. After washing in TTBS, the membranes were developed with 4-chloro-napthol in methanol/TBS. Positive clones were identified, collected by stab, and stored in suspension media at 4°C. The clones were amplified by polymerase chain reaction using gt11 forward and reverse primers and subcloned into pGEM-T plasmid vector (Promega). The Sequenase 2.0 (U. S. Biochemical) double stranded DNA sequencing protocol was used. Electrophoresis was on 6% polyacrylamide gels according to standard protocols.
An Invitrogen mRNA isolation kit was used to obtain rat incisor odontoblast pulp, heart, liver, and muscle mRNA. For each tissue 1 g of mRNA was electrophoresed on 1% formaldehyde gels, transferred to Hybond N ϩ , and fixed by baking at 80°C. Membranes were prehybridized and hybridized by the Stratagene Quickhybridization method. A [ 32 P]DCTP-labeled probe was obtained using random priming of clone 2PP with a nick translation labeling kit (Boehringer Mannheim). After hybridization the filters were washed at 65°C with 1 ϫ SSC, 0.1% SDS and then autoradiographed.
The chromosomal location of the mouse PP gene, designated as Dmp2, dentin matrix protein 2 (the symbol PP has been used to designate another protein), was determined by interspecific backcross analysis using progeny derived from matings of ((C57BL/6J ϫ Mus spretus)F l females and C57BL/6J) males. This interspecific backcross mapping panel has been typed for over 2000 loci that are well distributed among all of the autosomes as well as the X chromosome (20). C57BL/6J and M. spretus DNAs were digested with restriction enzymes and analyzed by Southern blot hybridization for informative restriction fragment length polymorphisms (RFLPs) (21) using the rat cDNA probe (22,23). The 7.8-kb BglI M. spretus RFLP was used to follow the segregation of the Dmp2 locus in the backcross mice. Recombinationdistances were cal-culated using the computer program SPRETUS MADNESS (24). Gene order was determined by minimizing the number of recombination events required to explain the allele distribution patterns.
Structure Analysis of Model Peptides-Polypeptide structure was modeled by energy minimization using the DISCOVER CVFF repetitive build program (BIOSYM software) on a Silicon Graphics R-4000, X/Z graphics workstation.

RESULTS AND DISCUSSION
PP Sequence and Domain Structure-The acid phosphatase was ϳ95% effective in dephosphorylating the PP 95K . The dPP 95K migrated as a single sharp blue Stains All stained band on the preparative gels and was easily eluted. The KLH-dPP 95K conjugate enhanced the immunogenic response to dPP 95K , and the anti-dPP 95K was reactive with both dPP 95K and PP 95K . The anti-dPP 95K was readily purified from an ammonium sulfate precipitate of the crude serum by passage over a PP 95K -Sepharose 4B affinity column. At a 1:100 dilution of the antiserum, it was possible to detect as little as 0.1 nmol of PP 95K on slot blots.
The screen of the odontoblast gt11 cDNA expression library with the anti-dPP 95K yielded about 20 clones. A clone that produced an ϳ2500 base pair nucleotide sequence, designated as 2PP, was selected for further study. The coding region of the 2PP cDNA provides the carboxyl-terminal portion of the corresponding protein. The nucleotide generated amino acid sequence is shown in Fig. 1. We are in the process of completing the sequencing in the 5Ј direction, but the very repetitive nature of the sequence makes progress slow. The overall amino acid composition of the 245-amino acid carboxyl-terminal domain shows that Ser, Asp, and Asn comprise 94% of this portion of the molecule, with Ser accounting for ϳ60%. There is a single Cys residue, and all of the other amino acids (Gly, Arg, Lys, His, Glu, and Thr) are hydrophilic. There is not a single hydrophobic residue nor any proline present in the carboxylterminal portion of the phosphophoryn sequence thus far obtained.
However, even a cursory examination of the sequence in Fig.  1 shows the presence of two prominent sequence motifs. Most striking is the presence of several sets of Asp-Ser-Ser (DSS) triads in the 245 amino acid residues of the carboxyl-terminal region. If one recognizes the DSS triplets as structural units, then the sequence can be written as below, with numbering in the amino-terminal direction using the carboxyl-terminal residue as Ϫ1: This portion of the PP sequence can be divided in two domains, from Ϫ245 to Ϫ75, and from Ϫ74 to Ϫ1. The first domain of 171 residues contains the [DSS] nϾ1 repeats, with n as large as 14, and only 4 residues different from Asp or Ser. The most carboxyl-terminal domain, from Ϫ74, has no extended DSS domains and contains all of the other non-Asp or -Ser residues. As noted by the underlines, there are five [SD] repeat sequences, [SD] m with m ϭ 2 or 3. These data are in agreement with the conclusions of Sabsay et al. (11), who examined peptides released by enzymatic digestion and partial acid hydrolysis of intact phosphorylated PP. They had concluded that the end regions had to contain sequences containing the majority of the non-Ser and -Asp residues and flanked a central core composed almost completely of Ser and Asp(Asn).
The existence of these two sequence domains raises the question of their structural consequences. Because the PP is about 90% phosphorylated as isolated from dentin, the (DSS) n domain sequence structure was modeled as a completely Ser phosphorylated polypeptide, (DS * S * ) 12 by energy minimization on a Silicon Graphics R-4000, X/Z graphics workstation using the DISCOVER CVFF force field and automatic set parameters for 10,000 iterations, until the system energy converged at Ϫ207 kcal/mol. The CVFF repetitive build program (25) is a generalized force field with parameters specifically designed to minimize polypeptide conformations and is part of the IN-SIGHT II 950 molecule modeling system from BIOSYM/MSI (San Diego, CA). The potential electrostatic repulsions of the charged phosphate and carboxylate groups were taken into account, along with the other backbone and side chain interactions. The most probable structure is a nonplanar, folded, mod-ified trans-extended backbone chain as depicted in Fig. 2A. Three features are evident. Because of the trans-peptide backbone, the adjacent S * phosphates in each triplet are on opposite edges of the chain, creating two ridges of phosphates. There are also two ridges of Asp carboxyls, running alongside the phosphate ridges. These ridges can be seen best in the end view in Fig. 2B. Within the repetitive [DSS] domains, the most important feature is the [SDS] repeat, which creates pairs of phosphates along the same edge of the extended chain, separated by a single amino acid residue (Fig. 2A). The pairs of phosphates alternate in direction along the backbone. The phosphates in each pair are separated by 5.1 Å, but the pairs are separated  12 domain of 2PP in the absence of counter ion binding. A, a lateral view of the extended chain through three DS * S * repeats, showing the folding or puckering of the backbone and the relative dispositions of the phosphate (purple) and carboxyl (green and red) groups relative to the backbone. The alignment of the phosphoserines in clusters along interaction edges offer the potential for calcium binding and mineral orientation. Note that the adjacent phosphates on the same edge reside on adjacent triplets, whereas the carboxyl group closest to each pair of phosphates is in the next triplet. If one numbered a sequence D 1 S 2 S 3 D 4 S 5 S 6 D 7 , the carboxyl of D 7 would be closest to the phosphates on S 5 and S 3 . Thus, the creation of the near neighbor ionic concentration of two phosphates and one carboxylate requires the components of three successive triplets. B, an end view of the minimized structure emphasizes the formation of the ionic ridges along the peptide backbone, with the parallel and close interaction between carboxyl and phosphate ridges. C, the minimized structure of the S * DS * DS * D domain approximates a trans-extended chain. The carboxyls and phosphates line up on opposite sides of the chain, presenting very different ionic distributions than that along the DS * S * repeats. along the same edge by about 18 Å. Likewise, the carboxylate edges are very specifically structured and spaced along the peptide backbone. The repetitive DSS motif aligns a carboxyl of the next triplet with each pair of phosphates in the combined phosphate-carboxylate ridge (Fig. 2B). It will be important to investigate the potential calcium ion binding properties of such domains.
On the other hand, the [SD] m sequences are quite different. These have only two ionic edges in the trans-extended chain conformation, with all of the phosphates on one edge and the carboxylates on the other (Fig. 2C).
Tissue and Chromosomal Localization-At the biochemical level, the PPs have been found only in mineralized dentin. To verify that the 2PP was expressed only in the teeth, a 2PP probe was used in Northern analysis to examine the RNA from a variety of tissues. The probe detected a single RNA band (Fig.  3) at ϳ7-7.5 kb in the odontoblast pulp mRNA, but none of the other tissue mRNAs, including brain, liver, and bone, showed any reactivity (data not shown), indicating that the 2PP mRNA was indeed tooth-specific.
The chromosomal backcross mapping results indicated that the Dmp2 gene is located in the central region of mouse chromosome 5, linked to Ffg5, Dmp1, and Adrbk2. Although 111 mice were analyzed for every marker and are shown in the segregation analysis (Fig. 4), up to 191 mice were typed for some pairs of markers. Each locus was analyzed in pairwise combinations for recombination frequencies using the additional data (24). The ratios of the total number of mice exhibiting recombinant chromosomes to the total number of mice analyzed for each pair of loci, and the most likely gene order is: centromere-Fgf5-4/191-Dmp2-0/150-Dmp1-2/114-Adrbk2. The recombination frequencies (expressed as genetic distances in centimorgans Ϯ standard error) are Fgf5-2.1 Ϯ 1.0-[Dmp2, Dmp1]-1.8 Ϯ 1.2 Adrbk2. No recombinants were detected between Dmp1 and Dmp2 in 150 animals typed in common, suggesting that the two loci are within 2.0 centimorgans of each other (upper 95% confidence limit).
Our interspecific map of chromosome 5 has been compared with a composite mouse linkage map that reports the map location of many uncloned mouse mutations (Mouse Genome Database, The Jackson Laboratory, Bar Harbor, ME). The cen-tral region of mouse chromosome 5 shares homology with human chromosomes 4q and 22q (summarized in Fig. 4). In particular Fgf5 has been placed on human 4q21 and Adrbk2 on human 22q11. The placement of Dmp1 and Dmp2 between these two genes in the mouse suggests that Dmp1and Dmp2 will reside on 4q or 22q as well.
Alpin et al. (26) have confirmed that Dmp1 is on chromosome 4q21 in the human. Thus the above data demonstrate that the clone 2PP represents a tooth (dentin pulp)-specific mRNA, whose gene, Dmp2, probably co-localizes on human chromosome 4q21 with the gene for another tooth dentin-specific protein, Dmp1 (23). The inherited genetic disorder of dentin mineralization, dentinogenesis imperfecta, type II, has also been linked to human chromosome 4q21 (27). Bone sialoprotein and osteopontin have also been localized to this region (28), but neither has shown any defect relative to dentinogenesis imperfecta, type II (29). These data show that the prior report (30) that the PP gene was not on 4q21 and not related to dentinogenesis imperfecta, type II, was in error because of misidentification of the PP gene.
Potential Function of PP-The unique nature of the phosphophoryns and many studies showing that in vitro they can modulate phosphate crystal nucleation and regulate crystal growth (5-7) have led to numerous discussions of the potential role of the phosphophoryns in vivo and to the discussion of this group of molecules as models for protein modulations of biomineralization (2,31). In spite of the monotony of Ser and Asp composition, the present work demonstrates that the phos- Poly(A) ϩ mRNA was prepared from 3-week-old rat incisors. 1 g of mRNA was used for running the Northern. There is a single band, corresponding to a message length of approximately 7 kb, about twice the length required for production of a polypeptide of 95 kDa. Comparable Northern blots of bone, brain, and liver mRNA were blank, showing that 2PP message level was below detection limits (data not shown).
FIG. 4. Chromosomal mapping of the mouse Dmp2 gene. DNA isolation, restriction enzyme digestion, agarose gel electrophoresis, Southern blot transfer, and hybridization were performed essentially as described (20). A major fragment of 13.0 kb was detected in BglIdigested C57BL/6J DNA, and a major fragment of 7.8 kb was found in BglI-digested M. spretus DNA. The presence or the absence of the 7.8-kb BglI M. spretus-specific fragment was followed in backcross mice. The probes and RFLPs for the loci linked to Dmp2, including fibroblast growth factor 5 (Fgf5), dentin matrix protein 1 (Dmp1), and adrenergic receptor kinase, beta-2 (Adrbk2), have been reported previously (22,23). Dmp1 and Dmp2 are essentially localized at the same position within the same chromosome. As indicated on the right, Fgf5 and Dmp1 have been mapped to human chromosome 4q21, and Adrbk2 has been mapped to human 22q11. The colocalization of Dmp1 and Dmp2 in mouse suggests their colocalization on human chromosome 4q21 as well.
phophoryn molecule clearly has Ser-and Asp-rich domains of different structure. The unique [DSS] n triplet repeats create domains with very specific yet unique patterns of phosphates and carboxylates. These distinctive ionic patterns may be involved in either the site-specific binding of phosphophoryn to collagen fibrils (32), nucleation of crystallization (6), or the regulation of the crystal growth patterns (7). Thus, it will now be important to determine, with recombinant constructs of these domains, their interaction with calcium phosphate crystals and crystallization. Further, as sequencing through the NH 2 -terminal domains is completed, these and other constructs of the domains of the PP can be explored with regard to the mechanism of their phosphorylation. Work is already in progress to verify the relationship between tooth mineralization disorders and PP content and expression. These studies should ultimately reveal the function of this archetypal mineralized tissue-related molecule.