![]()
|
|
||||||||
J. Biol. Chem., Vol. 281, Issue 35, 25689-25702, September 1, 2006
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

1
12


**3
From the
Glycobiology Research and Training Center and the Departments of
Cellular and Molecular Medicine, **Medicine, and ||Pathology, University of California San Diego, La Jolla, California 92093-0687 and the ¶Broad Institute of Massachusetts Institute of Technology and Harvard University, Cambridge, Massachusetts 02141
Received for publication, May 3, 2006 , and in revised form, June 12, 2006.
| ABSTRACT |
|---|
|
|
|---|
| INTRODUCTION |
|---|
|
|
|---|
Sia residues are involved in many biological processes, often involving binding by intrinsic and extrinsic Sia-recognizing proteins. As an example of intrinsic Sia recognition, complement factor H uses Sia as a means to identify "self" and to prevent autoimmune attack by the alternate complement pathway; in contrast, foreign cells lacking Sia are not protected (5). Current evidence suggests that the CD33-related subset of Siarecognizing Ig-like lectins (Siglecs) may also serve to recognize host Sia residues as self (4), thus dampening autoreactivity of cells of the innate immune response. Meanwhile, numerous vertebrate pathogens recognize and bind to glycan structures containing Sia residues (2), using them as portals to gain entry. Elimination of vertebrate host Sia production to avoid such pathogens is not an option, as this results in embryonic lethality (6). Complicating matters, many successful microbes express surface Sia residues, mimicking the host and avoiding recognition by many arms of the immune system (7). Taken together, these data suggest that Sia residues are involved in an ongoing biochemical "arms race" between hosts and pathogens, driven to diversify by "Red Queen" effects,6 even while conserving critical endogenous functions (4, 8).
Taking a "biochemical systems" approach to analyzing all extant data (see Fig. 1), we found that there are <60 known genetic loci directly involved in the biosynthesis, activation, transport, modification, transfer, recycling, degradation, and recognition of Sia residues within humans and other vertebrates (see Fig. 1 and Table 1). With the exception of the CD33-related Siglecs (CD33rSiglecs), all these loci are conserved between the human and mouse genomes, indicating their functional importance. Precursor molecules are first converted into Sia residues, which are then activated to CMP-Sia and transported to the Golgi apparatus, where members of a family of sialyltransferases transfer them onto the terminal ends of glycan chains in various types of structurally distinct linkages (see Fig. 1). Sia residues may also be modified from one form to another such as from Neu5Ac to Neu5Gc or by the addition of O-acetyl groups, and these alterations are differentially recognized by receptors on the same cell surface or on other cells. Sia residues attached to macromolecules are eventually cleaved from glycan chains in the lysosome, actively returned to the cytosol, and then recycled or degraded (see Fig. 1).
|
2-6-linked Sia expression on selected cell types, presumably because of changed expression of the sialyltransferase ST6GAL1 (18); human-specific changes in one SIGLEC9 exon associated with the accommodation of Neu5Ac recognition by SIGLEC9 (19); human-specific loss of an entire primate-specific Siglec gene (SIGLEC13) (20); a human-specific gene conversion of SIGLEC11 causing changes in binding properties and newly induced expression in the brain (21); and selective down-regulation of CD33rSiglecs in human T cells (22). Additional studies suggest other species-specific gene conversion events among some hominid Siglecs (23) and other examples of human-specific changes in Siglec gene expression (22). The finding of so many human-specific functional differences from chimpanzees and other great apes within one biochemical/biological system suggests that it was subjected to major selective pressure(s) at some point(s) in human evolution.
Although all these genes are part of a well defined system (Sia metabolism and function), they are not represented as a single biological process in widely used genomic classification systems such as the Gene Ontology system (24) or PANTHER (25), which is also true of most other genes involved in glycan biology. These functionally related genes actually fall into diverse groups within the conventional Gene Ontology classification. They are also (with the exception of the CD33rSiglecs) randomly distributed throughout the genome. We suggest that all these genes should be evaluated together in a biochemical systems approach, considering the biosynthesis, activation, transport, modification, transfer, recycling, degradation, and recognition of Sia residues. Here, we undertake such an approach toward understanding the evolution of Sia biology in primates, rodents, and other mammals in combination with selected biochemical studies. We first investigate whether specific loci or functional classes of loci in this system have been subjected to adaptive selective pressures, whether any common principles emerge, and whether differences between chimpanzees and humans are more significant despite a shorter divergence time. We then take a biochemical approach to put the genomic data into context.
| EXPERIMENTAL PROCEDURES |
|---|
|
|
|---|
Identification of Chimpanzee OrthologsHuman RefSeqs or human genome sequences from 43 human loci (excluding the CD33rSiglecs) were used to extract orthologous coding sequences from the chimpanzee genome assembly (NCBI Build 1 Version 1) as identified by reciprocal best BLASTZ alignments (12). Phred quality scores for each site in chimpanzee sequences were also provided by the Chimpanzee Sequencing and Analysis Consortium (12). For eight of the human CD33rSiglecs (SIGLEC3, SIGLEC5-10, and SIGLEC12), high quality sequences were also obtained from our independent high resolution comparative analyses of human, chimpanzee, baboon, mouse, and rat (20). One more Siglec locus (SIGLEC13) is found in the chimpanzee and baboon genomes, but its complete deletion in humans was reported previously (20). SIGLEC13 was therefore used only for the domain-specific comparative analyses.
Mouse and Rat OrthologsRefSeqs of mouse and rat orthologs were obtained from the NCBI LocusLink web site. Reliable mouse sequences were obtained for all but four loci; reliable rat sequences were obtained for all but nine sequences. All rodent loci obtained have one sequence as the RefSeq, with the exception of the mouse St3gal2 locus. The representative sequence of the mouse St3gal2 locus was selected by following the procedure described for human loci. The high quality sequences of mouse and rat CD33rSiglec orthologs (Siglec3 and SiglecE-G) were obtained as described (20).
Evolutionary AnalysisSequence alignments of coding regions were performed in ClustalW (26) and manually checked to see whether chimpanzee sequences had insertions or deletions causing frameshifts in the aligned open reading frame. These were handled with reference to human and mouse sequences, which showed identical open reading frames for all loci studied, except for one locus (NAGK; see supplemental text). Chimpanzee-specific insertions were assumed to be errors and were deleted to maintain an open reading frame even if they had high quality scores. Frameshifts caused by deletions were left in the alignments as gaps, but the codons they were located in were removed from the analyses (see supplemental text). The sequences modified by these processes are referred to as "modified" sequences in supplemental Table 1. In the alignments, some sites that are substitutions or indels between the human and chimpanzee sequences show low quality Phred scores in chimpanzee. Because these low quality sites could be artifacts from the sequencing and base-calling process, a second round of analyses were done in which such low quality chimpanzee sites were changed to match the human sequences at the sites in question. The chimpanzee sequences in which substitutions were modified to match the human sequences are referred to as "humanized" sequences in supplemental Table 1. Several chimpanzee sequences also show regions of non-called bases (represented by "N" in supplemental Table 1). Gene sequence regions that had non-called bases in the chimpanzee sequence were excluded from analyses.
The evolutionary parameters shown in Table 1 were calculated in multiple species comparisons using human, chimpanzee, mouse, and rat. The numbers of synonymous (Ks) and nonsynonymous (Ka) substitutions per site were estimated by the method of Nei and Gojobori (27) with the Jukes-Cantor correction (28). Values for Ka and Ks were calculated using DnaSP Version 3.51 (29) or MEGA2 (30). Statistical tests were performed to assess the significance of evolutionary differences obtained in the analyses by using InStat Version 0.6 (GraphPad Software) or MEGA2.
Protein Secondary Structure PredictionFor secondary structure prediction of the sialyltransferase loci, new joint method analysis was performed using web-based software at the Parallel Protein Information Analysis System (PAPIA) web site (available at www.cbrc.jp/papia-cgi/ssp_menu.pl) (61).
Lectin Staining of Sialic Acids on Tissue SectionsParaffin sections of lung, kidney, and spleen samples from seven humans, eight chimpanzees, six rats, and six mice were deparaffinized, blocked, and overlaid with predetermined concentrations of biotinylated Sambucus nigra agglutinin (SNA) lectin or biotinylated Maackia amurensis hemagglutinin (MAH) lectin or with control reagent. Binding was detected by alkaline phosphatase-labeled streptavidin using Vector blue substrate, nuclear fast red counterstaining, and aqueous mounting. Samples were washed with Tris-buffered saline containing 0.2% Tween and 1% bovine serum albumin to block nonspecific binding. Digital photomicrographs were taken while viewing with an Olympus BH2 microscope with a MacroFire camera and Adobe Photoshop.
Preparation of Erythrocyte Ghosts and PlasmaBlood from multiple taxa was collected directly into BD Vacutainer tubes containing EDTA, stored overnight at 4 °C, and then spun at 2000 x g for 10 min at 4 °C. The plasma was removed and stored frozen until further work-up. The buffy coat was removed, and the erythrocyte pellet was washed twice with 10 volumes of ice-cold phosphate-buffered saline (pH 7.4). Lysis of erythrocytes was accomplished by adding 15 volumes of ice-cold 10 mM Tris-HCl (pH 7.5) and 1 mM EDTA. The sample was transferred into glass Sorvall tubes and centrifuged at 10,000 x g for 20 min. The supernatant was carefully aspirated so as not to disturb the remaining "soft" pellet. The creamy particulate material that did not diffuse easily (representing contaminating white cells) was also removed. The tubes were filled with lysis buffer and centrifuged again. The process was repeated until ghosts were white. The last wash was made with ice-cold water containing 0.01% butylated hydroxytoluene as a preservative.
Glycopeptide Preparation from Erythrocyte Ghosts and PlasmaPlasma (0.5 ml) was lyophilized in a glass conical tube, and 250 µl of water was added. 0.5 ml of the ghosts was transferred into a glass conical tube, assuming
50% water. The lipids were extracted from each of the above samples with 20 volumes of 2:1 (v/v) chloroform/methanol using a Brinkmann Instruments Polytron at a high setting for 30-60 s. The samples were centrifuged at 800 x g for 5 min after each extraction. All supernatants containing the lipids were pooled into a single glass vessel. Each sample was extracted again with 2:1 (v/v) chloroform/methanol. The pellets were extracted twice with 1:1 (v/v) chloroform/methanol and twice with 1:2 (v/v) chloroform/methanol. The remaining glycoprotein pellet was extracted with 95% ethanol, and the supernatant was also added to the pool. The glycoprotein pellet was immediately dissolved in 100 mM Tris-HCl (pH 6.5). Low molecular weight molecules were removed from the samples by performing dialysis using Mr 3500 cutoff tubing against a 500-fold volume of 100 mM Tris-HCl (pH 6.5) and 2 mM EDTA overnight at 4 °C. The retentate was recovered and digested with 0.1 volume of 20 mg/ml proteinase K made in 50 mM Tris-HCl (pH 8.0) and 2 mM calcium acetate, followed by incubation at 50 °C for 8 h. At the end of the day, another aliquot of the 10x proteinase K solution was added to the sample, and the digestion mixture was allowed to incubate overnight. The enzyme was inactivated by boiling for 10 min; the sample was centrifuged to remove particulates; and the resulting supernatant was loaded onto a 1-ml column of DEAE-Sephacel (GE Healthcare) equilibrated in 20 mM Tris-HCl (pH 6.5) and 0.1 M NaCl (62). The column run-through fraction was collected and reloaded onto the column. The column was washed with 30 ml of 20 mM Tris-HCl (pH 6.5) and 0.1 M NaCl. The column run-through fractions containing glycopeptides were pooled with the wash, and dialysis was performed against a 100-fold volume of water at 4 °C using Mr 1000 cutoff tubing for 12-16 h. The dialysis solution was changed to 2 mM EDTA for 8-12 h and changed back to water overnight. The sample from the dialysis tubing was recovered, frozen, and lyophilized. The resulting powder was dissolved in 1 ml of water, transferred to a smaller container, and frozen and lyophilized again. The resulting glycopeptides were recovered and weighed.
|
Analysis of N- and O-Glycans by High Performance Anion Exchange Chromatography with Pulsed Amperometric Detection (HPAEC-PAD)Free oligosaccharides were analyzed by HPAEC-PAD (31) on a CarboPac PA1 column (4 x 250 mm) in-line on a DX500 HPLC system equipped with a pulsed amperometric detector and a Thermo Separations AS3500 autosampler. The various oligosaccharides were eluted with a linear gradient of sodium acetate from 20 to 250 mM over 60 min in 100 mM sodium hydroxide. Data acquisition and processing were performed with Dionex PeakNet software. Elution profiles of the glycans were compared with those of standard N- and O-glycans of known elution behavior.
Determination of the Sialic Acid Types in Erythrocyte Ghosts and Plasma SamplesSialic acids were released from the erythrocyte ghost or plasma glycopeptides by hydrolysis in 2 M acetic acid at 80 °C for 3 h. The released Sia residues were separated from high molecular weight proteins by passage through an Amicon Microcon-10 filter. The flow-through fraction was derivatized with an equal volume of 2x 1,2-diamino-4,5-methylenedioxybenzene reagent (32) and heated at 50 °C for 2.5 h. The fluorescently tagged sialic acids were separated on a Varian Microsorb-MV 100-5 C18 column (4 x 250 mm) in the isocratic mode using 85% water, 8% acetonitrile, and 7% methanol and detected with a SpectroVision FD-300 fluorescence detector with emission set at 373 nm and excitation at 448 nm. Elution profiles were compared with those of standard sialic acids of known elution behavior.
| RESULTS |
|---|
|
|
|---|
Some of these functional labels correspond generally to those listed for loci in the Gene Ontology (24) or PANTHER (25) data bases, but most are more specific in the context of Sia biology. A few loci appear to have additional functions or capabilities external to the Sia biology pathway, e.g. the RENBP gene product is also a renin-binding protein, and the PPGB gene product is a cathepsin protease that also serves to stabilize lysosomal
-galactosidase. By taking a systematic "sialic acid biochemistry-based" approach to grouping these loci rather than a strict categorical label via the current Gene Ontology scheme,8 we hoped to uncover information that is specifically relevant to the evolution and diversity of Sia biology in humans and other mammals.
|
|
|
Overall, the high rate of substitution and the relatively high Ka/Ks values suggest that the recognition category is evolving more rapidly than the others. This difference between gene categories may reflect a difference in evolutionary environment. Previous work in the anthocyanin pathway (39) suggested that genes upstream in a biosynthetic pathway tend to evolve more slowly than downstream genes. Although the Sia biology pathway as we have defined it here is not strictly linear, there is a general trend toward early acting loci such as those involved in biosynthesis and activation/transport/transfer evolving under more constraint than downstream loci such as those in the recognition category (Table 2).
Within the recognition group, Siglecs account for 56% (9 of 16) of human genes and 42% (5 of 12) of mouse and rat genes. Ka/Ks ratios for Siglec loci are significantly greater than those for non-Siglec members of the recognition group in both primates (p = 0.006) and rodents (p = 0.007) (Table 3), indicating that Siglecs are driving the higher values for this category. This difference appears to come mainly from an increase in Ka values rather than Ks values (Table 3), consistent with the notion that Siglecs may be undergoing adaptive evolution in humans and primates (19, 20). Indeed, comparisons of the chimpanzee and human genomes indicated that CD33rSiglecs are among the fastest evolving groups of genes in the entire genome (12).
|
Siglecs have multiple extracellular Ig-like domains, followed by a single transmembrane domain and a short cytoplasmic tail (4, 33). The first Ig-like domain (Ig1, V-set Ig-like domain) is known to be responsible for Sia recognition. Prior analyses have suggested domain-specific accelerated evolution associated with a functional change in the Ig1 domain of human SIGLEC9 (19), as well as a more rapid accumulation of nonsynonymous substitutions compared with an adjacent domain (Ig2, C2-set Ig-like domain) (20). These data indicate that Ig1 might be the target for evolutionary change in the primate lineage. We therefore re-examined our prior analyses (20) by examining non-CD33rSiglec loci (SIGLEC1 and CD22) and excluding SIGLEC11 and SIGLEC5 (because of evidence of gene conversion) (21, 23). The Siglec loci thus used were as follows: human SIGLEC1, CD22, CD33, SIGLEC6-10, and SIGLEC12 (SIGLEC13 is deleted in the human genome (20)); chimpanzee SIGLEC1, CD22, CD33, SIGLEC6-10, SIGLEC12, and SIGLEC13; baboon CD33, SIGLEC6, SIGLEC8-10, and SIGLEC13 (SIGLEC7 and SIGLEC12 are deleted in the baboon genome; baboon SIGLEC1 and CD22 are not available (20)). For mouse and rat, we used the available reliable sequences, which were Cd33 and SiglecE-G. Orthology between primate and rodent CD33rSiglecs is unclear because several exon/domain-shuffling events appear to have occurred in the primate lineage (20). Thus, we could not reliably compare individual primate and rodent CD33rSiglecs genes.
|
The above approach compares Ig1- and Ig2-coding sequences that are only
400 and
300 bp, respectively. To obtain more robust statistical power, we concatenated all available Siglec Ig1- or all Ig2-coding sequences for each species. For concatenated Ig1, all primate comparisons showed Ka/Ks > 1, indicating rapid evolution (Fig. 3). The mouse-rat comparisons did not show Ka/Ks > 1 (0.821), but the value is still rather high. In contrast, all comparisons of concatenated Ig2 sequences gave relatively low Ka/Ks ratios. We performed Fisher's exact tests to compare rates of synonymous and nonsynonymous evolution between concatenated Ig1 and concatenated Ig2. Concatenated Ig1 domains had a greater number of total substitutions than concatenated Ig2 domains in all species pairs. All species pairs also had significant differences in the proportions of nonsynonymous and synonymous substitutions between concatenated Ig1 and concatenated Ig2 (p < 0.010 for all four comparisons), with more nonsynonymous changes in concatenated Ig1 (data not shown). Taken together, the above findings indicate that an accelerated accumulation of nonsynonymous substitutions has occurred in Ig1 compared with Ig2 and that the Sia recognition function of the Siglec Ig1 domains is more rapidly evolving in at least two different mammalian clades, primates and rodents, with the highest rate in humans.
Sialyltransferase Sequences Are Highly Conserved, but Their Tissue Expression Patterns Are NotSialyltransferases are responsible for the formation of sialylglycoconjugates by transferring the Sia group from CMP-Sia to one of many possible glycoconjugate acceptors. In striking contrast to the Sia-recognizing proteins, sialyltransferase sequences were found to be highly conserved among primates and rodents (Table 2 and supplemental Tables 1 and 2). Despite this, we found that the actual tissue pattern of Sia linkages generated by these enzymes varies widely across different tissue types among humans, chimpanzees, mice, and rats (Fig. 4). Using the lectins SNA and MAH to detect
2-6- and
2-3-Sia linkages, respectively, we found many interspecies differences and only a few consistent similarities (Fig. 4). For example, expression of SNA-positive
2-6-linked Sia in lung bronchioles was human-specific.
2-6-Linked Sia is also expressed in B cell areas of the spleen in human, chimpanzee, and mouse, but not in rat. SNA reactivity in the red pulp area of the spleen was seen only in chimpanzee. In kidney distal tubules, expression of
2-6-linked Sia was found discordantly in human and rat. However, expression of this linkage is preserved across all four species in endothelial cells and kidney glomeruli. Expression of MAH-positive
2-3-linked Sia was also found in T cell areas of the spleen and in kidney glomeruli of all four species examined. In contrast, it was seen only in chimpanzee lung bronchial epithelium goblet cells and chimpanzee spleen red pulp. Thus, each species appears to have experienced specific gains and losses of Sia expression, despite general conservation of sialyltransferase sequences.
Species-specific Changes in SialylmotifsAlthough the causes of species-specific differences in sialylation are mostly unclear, a few focused sequence changes in sialyltransferase catalytic domains could have effects on sialyltransferase action. All eukaryotic sialyltransferases have four conserved peptide regions in their catalytic domains, referred to as sialylmotifs L (long) and S (short) (40), 3 (41), and VS (very short) (42). Sialylmotif L is involved mainly in donor substrate binding (43), and sialylmotif S is important for binding to both donor and acceptor substrates (44). We identified a number of species-specific amino acid changes in the sialylmotif regions of several sialyltransferases. Because crystal structures of sialyltransferases are not currently available, protein secondary structure prediction was performed to obtain information about consequences of these species-specific amino acid changes. Comparison of predicted locations of helix, coil, and sheet structures among primates and rodents suggests that one locus (ST8SIA3) has potentially important structural changes between rodents because of both mouse- and rat-specific amino acid changes and that two additional loci (ST6GALNAC3 and ST8SIA2) show potentially major structural changes in primates resulting from human-specific amino acid changes (Fig. 5, A and B). Of these, the human-specific change in ST8SIA2 is of particular interest because it appears to be expressed mainly in fetal brain (45) and generates polysialic acid chains, which are known to be involved in regulating neural plasticity and neurite outgrowth (46-48).
|
|
Species-specific Diversity of Sialic Acid TypesAlthough the above profiling method has many advantages, one limiting factor is the fact that two common types of Sia residues (Neu5Ac and Neu5Gc) can cause significantly different elution properties for glycans to which they are attached. Indeed, some of the most striking differences between human and great ape samples could be partly due to the human lack of Neu5Gc. Another problem is that the hydrazinolysis procedure can result in some loss of N-glycolyl groups (converted into N-acetyl groups upon re-acetylation) and complete loss of O-acetyl esters on sialic acids. Thus, we also quantified the relative amount of different kinds of sialic acids in the erythrocyte ghost and plasma glycopeptides (Table 4). Although human ghost and plasma glycans contain primarily Neu5Ac, great apes contain predominantly Neu5Ac in plasma but mostly Neu5Gc in ghosts. Only small amounts of 9-O-acetylated Neu5Ac were seen in these hominids. Rat and horse exhibited high levels of O-acetylated Neu5Ac, whereas the other taxa showed little to none. Only orangutan appeared to have O-acetylated Neu5Gc, in contrast to the other primates as well as other mammals. Overall, we can conclude that both sialic acid diversity and expression are rapidly evolving among different taxa.
|
| DISCUSSION |
|---|
|
|
|---|
There appear to be different selective pressures between gene categories involved in Sia biology, as evidenced by differences in divergence rates and rates of evolution as measured by Ka/Ks ratios across categories. Sia-recognizing molecules in particular appear to be very rapidly evolving, and the acceleration in Siglec molecules that recognize Sia residues is consistent with the hypothesis that these loci play important roles in host immune modulation. Loci involved in Sia biosynthesis appear to be under stronger functional constraint in both primates and rodents. However, there is a striking disparity between the level of coding sequence conservation and species-specific expression of sialyltransferase products. The precise mechanisms and consequences of these unique species-specific expression patterns are currently unknown. Previous work has suggested that the expression of one sialyltransferase (ST6GAL1) may be regulated either by differential promoter usage or by changes in the expression of transcription factors (53-55). This may be the case for all sialyltransferase loci, as their generally high level of coding sequence conservation suggests that factors other than simple amino acid changes may be responsible for the patterns of interspecific expression variation. As for the additional rapid evolution of sialic acid types, most of the relevant genes have not yet been identified and cloned, so it cannot yet be determined whether coding sequence changes or regulatory changes are responsible for these patterns. Regardless of the underlying cause(s) of this rapid evolution, the fact that tissue sialylation patterns differ so widely among such closely related taxa raises caution about the use of animal model systems to understand human glycosylation-related disorders.
Overall, it appears that two distinct modes of rapid evolution are taking place in Sia biochemistry and biology. Within the CD33rSiglecs, there are ongoing changes in the actual amino acid sequences of the Sia-binding Ig-like domain associated with changes in binding activity. In contrast, the expression patterns of the sialyltransferases (and glycans in general) are rapidly diverging within mammals, even while their primary amino acid sequences remain conserved. Although these are different classes of loci that operate in different parts of the Sia life cycle, these two phenomena are related by the fact that the CD33rSiglecs recognize Sia residues originally placed onto glycan chains by the sialyltransferases. Overall, the current data are consistent with a recently proposed evolutionary scenario (4) predicting that terminal sialylation would have evolved more rapidly than other systems to evade pathogenic infections. Thus, whereas the sialyltransferase expression patterns defining the host sialome are rapidly evolving to evade pathogens that use Sia residues as targets for binding (a Red Queen effect) (8), the Sia-binding sites of CD33rSiglecs (which are thought to have the ability to recognize the self-sialome) are also rapidly evolving to keep up with the constantly changing sialome, resulting in a secondary Red Queen effect (4). It is also possible that CD33rSiglec Sia-binding sites need to simultaneously evolve rapidly to directly evade pathogens that express Sia residues, another primary Red Queen effect (4).
Interestingly, a second class of Sia-recognizing molecules, the selectins, did not show a similar rapid evolution of their Sia-binding C-type lectin domains. Although both the Siglecs and the selectins bind Sia residues, the selectins differ from the Siglecs in their recognition specificity and functions. Siglecs discriminate subtle differences in the specific Sia involved, such as its underlying linkage, charge, and side chain type (4, 33). Unlike Siglecs, however, selectins do not require the entire sialic molecule for recognition, just the negative charge, which can even be provided by a sulfate ester at the same 3-position of galactose (35, 36). Thus, selectins should be under less pressure to evolve rapidly to match the host sialome. Also, whereas Siglecs appear to have both intrinsic and extrinsic recognition functions, selectins are thought to act primarily in intrinsic recognition processes in vascular biology. This suggests that intrinsic recognition is under stronger constraint than extrinsic recognition in Sia biology. Indeed, there are no amino acid substitutions between humans and chimpanzees in any of the selectin C-type lectin domains (data not shown), suggesting that these regions are under stronger functional constraint and less diversifying pressure.
Recent studies have suggested examples of domain-specific rapid evolution in settings in which Ka/Ks ratios for the entire genes showed no significant differences (19, 20). This hypothesis is supported by domain-specific analyses of the Siglec loci, which suggest more rapid evolution of the functional Sia-binding domain than adjacent domains. Also of note is the fact that we have so far not found as many major differences in Sia biology-related genes in rodents as in primates, despite the much greater time since their evolutionary divergence. Taken together, the data imply that the primate lineage, specifically the human lineage, has experienced differential selection pressures affecting Sia biology. More focused study of candidate loci and biochemical differences may help elucidate the causative mechanisms.
It has been suggested that the majority of gene expression differences between species are not necessarily functional adaptations, but rather the consequence of neutral or nearly neutral substitutions (56). If few gene expression changes are adaptive, then it may be even harder to see signatures of selection at the genomic DNA level. This underscores the important role that functional and biochemical studies must play in validating the existence and importance of biological changes between species. Our biochemical data underscore this fact, as we see marked species-specific differences in sialylation profiles between mammalian taxa that would not be predictable from sequence data alone. One additional way to help clarify genomic evidence for natural selection will be a population genetic approach, placing intraspecies polymorphism data in the context of divergence, to detect the footprint(s) of natural selection in these species. Functional and population genetic studies on several of these loci are underway to determine how these genetic and biochemical differences among primates and rodents may have contributed to functional phenotypic consequences relevant to the biological evolution of our species.
| FOOTNOTES |
|---|
The on-line version of this article (available at http://www.jbc.org) contains supplemental text, references, and supplemental Tables 1-3. ![]()
1 Both authors contributed equally to this work. ![]()
2 Present address: Research Inst. for Microbial Diseases, Osaka University, Suita, Osaka 565-0871, Japan. ![]()
3 To whom correspondence should be addressed: Dept. of Cellular and Molecular Medicine, Mail Code 0687, 9500 Gilman Dr., University of California, San Diego, La Jolla, CA 92093-0687. Tel.: 858-534-2214; Fax: 858-534-5611; E-mail: a1varki{at}ucsd.edu.
4 The abbreviations used are: Sia, sialic acid; Neu5Ac, N-acetylneuraminic acid; Neu5Gc, N-glycolylneuraminic acid; Siglecs, Sia-recognizing Ig-like lectins; CD33rSiglecs, CD33-related Sia-recognizing Ig-like lectins; SNA, S. nigra agglutinin; MAH, M. amurensis hemagglutinin; HPAEC-PAD, high performance anion exchange chromatography with pulsed amperometric detection. ![]()
5 The term sialome denotes the total complement of Sia types and linkages and their modes of presentation on a particular organelle, cell, tissue, organ, or organism as found at a particular time and under specific conditions (4). ![]()
6 The Red Queen effect in evolution refers to the observation to Alice by the Red Queen that "it takes all the running you can do, to keep in the same place." Complex multicellular animals with long life cycles must evolve rapidly to survive the attacks of microbial pathogens that can replicate much faster (57, 58). ![]()
7 The term great apes (including chimpanzees, bonobos, gorillas, and orangutans) is used here in the colloquial sense, as phylogenetic analysis of genomic information no longer supports this species grouping (59). Under the currently common classification, these species are now grouped together with humans in the family Hominidae. ![]()
8 Sometimes the molecular function or biological process listed in Gene Ontology is not wholly descriptive of a gene product's function. For example, searching CMAS returns biological processes of "CMP-N-acetylneuraminate biosynthesis," "lipopolysaccharide biosynthesis," molecular function of "N-acylneuraminate cytidylyltransferase activity," and cellular component of "nucleus." But no statement indicates that this gene is involved in the biosynthesis of sialylated glycans. ![]()
9 R. E. Taylor, T. K. Altheide, A. Varki, unpublished data. ![]()
| ACKNOWLEDGMENTS |
|---|
| REFERENCES |
|---|
|
|
|---|