Evolutionary Trace of G Protein-coupled Receptors Reveals Clusters of Residues That Determine Global and Class-specific Functions*

G protein-coupled receptor (GPCR) activation mediated by ligand-induced structural reorganization of its helices is poorly understood. To determine the universal elements of this conformational switch, we used evolutionary tracing (ET) to identify residue positions commonly important in diverse GPCRs. When mapped onto the rhodopsin structure, these trace residues cluster into a network of contacts from the retinal binding site to the G protein-coupling loops. Their roles in a generic transduction mechanism were verified by 211 of 239 published mutations that caused functional defects. When grouped according to the nature of the defects, these residues sub-divided into three striking sub-clusters: a trigger region, where mutations mostly affect ligand binding, a coupling region near the cytoplasmic interface to the G protein, where mutations affect G protein activation, and a linking core in between where mutations cause constitutive activity and other defects. Differential ET analysis of the opsin family revealed an additional set of opsin-specific residues, several of which form part of the retinal binding pocket, and are known to cause functional defects upon mutation. To test the predictive power of ET, we introduced novel mutations in bovine rhodopsin at a globally important position, Leu-79, and at an opsin-specific position, Trp-175. Both were functionally critical, causing constitutive G protein activation of the mutants and rapid loss of regeneration after photobleaching. These results define in GPCRs a canonical signal transduction mechanism where ligand binding induces conformational changes propagated through adjacent trigger, linking core, and coupling regions.

The profusion and diversity of G protein-coupled receptors (GPCRs) 1 give them a central role in health and disease. In humans, over 1000 genes encode these receptors (1), each of which responds to a single or few ligands by activating G proteins, which then modulate enzymes and channels to initiate highly amplified signaling cascades. Such cascades control sight, taste, smell, slow neurotransmission and the responses to most water-soluble hormones and chemokines. In fact, GPCRs are so ubiquitous that, although they are the targets of nearly 50% of current drugs (2), this is still a small fraction of their pharmacological potential (3).
Some of the major questions relevant to GPCR pharmacology include the following: What residues are critical for ligand binding and G protein activation? What do different receptor families have in common with regard to their activation mechanism? From a structural perspective, it is known that all GPCRs form a seven transmembrane (TM) ␣-helical bundle, connected by three intracellular and three extracellular loops, with an extracellular N terminus and an intracellular C terminus (4,5). However, low overall sequence identity of 25% even within class A GPCRs (6 -8) suggests that significant deviations can occur in ligand binding pockets and in interhelical contacts that stabilize or mediate the transition between active and inactive conformations (9).
From an experimental perspective, while there is a wealth of data on a handful of GPCRs, most are known only from translated DNA sequences, hence the need for computational methods to extract functional information from those sequences. For example, correlated mutational analysis (10) and sequencebased entropy (11) have been used to detect networks of functional residues. Other studies focused on receptor topology (12), functional fingerprints (13), dimerization interfaces (14), and modeling of receptor-ligand binding (15)(16)(17).
This study aims to test two specific and complementary hypotheses. First, that all GPCRs share a common mechanism of activation and G protein coupling. This hypothesis is consistent with common and often promiscuous activation of G proteins by GPCRs, even after whole TMs are swapped to form chimeric receptors (18). It also implies the presence of a ubiquitous GPCR activation switch made up of residues that are functionally important in all receptors. Our second hypothesis predicts that residues involved in ligand specificity are different in different receptors. This would be consistent with the enormous diversity of ligand sizes and types, ranging from ions and small molecules to peptides and large glycoproteins, and suggests that some residues will be important in some but not all GPCRs.
The search for these functionally important residues is based on the evolutionary trace method (hereafter ET, or tracing), which identifies sequence positions where variations among related proteins always correlate with evolutionary divergences (19). Control studies (20 -22), genuine predictions fol-lowed by mutagenesis studies (23), and large scale testing by us (24,25) or others (26 -28) show that trace residues cluster in the three-dimensional structure of proteins and that these clusters predict binding or catalytic sites. Moreover, differential ET analysis, which subtracts residues traced on many evolutionary branches of a protein family from those traced on a subset of these branches, can also identify positions that are important to that subset but not to the entire protein family (21).
In keeping with the hypotheses, we find that a trace of diverse receptors identifies a common functional site shared by all GPCRs and that differential ET identifies ligand-specific functional sites. The published mutational record confirms the former is a generic activation switch. Moreover, novel mutations at two trace residues reveal that they are linked to constitutive activity and protein stability demonstrating the predictive value of the evolutionary trace model to guide rational mutagenesis of GPCRs.
Global and Specific Determinants-Generically important residues were obtained by tracing this alignment (20,24,25), ranking the relative importance of each position, and selecting those in the top 5th, 10th, 15th, and 20th percentile ranks. The rank of a given residue is the number of branches of the phylogenetic tree at which it becomes invariant in each branch, starting from the root branch as 1. To identify trace residues uniquely important to each family, we subtracted the generic ones from those that were important in ET analysis of that family, at the same percentile rank (see Supplementary Material).
Mutational Data-Functional data on trace residues were gathered from the tGRAP mutant data base (32), the protein mutant database (33), and the literature. For global trace residues we considered mutations at cognate residues across all class A receptors, but for opsinspecific trace residues we used opsin mutations only. A ligand binding effect means that binding assays showed altered affinity or specificity. A G protein-coupling effect signifies altered GTP binding or in inositol phosphate production under appropriate controls. Constitutive activity means signaling activity (inositol phosphate production/cAMP accumulation/G protein activation) in a non-activated (ligand-free) receptor. Folding or expression effects mean considerably less mutant receptor expression under identical conditions as wild-type. Subtle changes were not taken into account, and given multiple functional effects a residue was classified by the one most frequently observed ( Fig. 1 and Table II). Two-tailed 2 tests were performed as described in a previous study (34).
DNA Mutagenesis and Expression of Opsins-All opsin variants were expressed using modified forms of a plasmid containing a synthetic opsin gene cDNA in a pMT3-based vector (35,36). Mutations were introduced using the QuickChange® mutagenesis kit (Stratagene). Expression in COS-1 cells, membrane isolation, reconstitution with 11-cisretinal, and protein purification using 1D4 antibody affinity chromatography were carried out as described previously (36 -40). Absorbance spectrophotometry was carried out using an Olis/SLM-Aminco DW-2000 dual-beam instrument adapted for darkroom use. Photolysis of rhodopsin was carried out using a continuous wave argon ion laser directed into the sample compartment from above by a concave mirror that dispersed the beam evenly over the cuvette, with beam gating carried out with a manual shutter. Complete photoisomerization of rhodopsin and all mutants was achieved without significant heating in less than 1 s. Transducin activation assays were carried out as described previously (39).

RESULTS
Global Determinants-To find residues mediating a generic signal transduction mechanism, we traced jointly four evolutionarily distant families: the visual rhodopsin, bioamine, olfactory, and chemokine receptors. The 39 residues ranked in the top 20th percentile ( Fig. 1 and Tables I and II) are predicted to be generically important. When they are mapped onto the rhodopsin structure, they form a three-dimensional structural cluster that is internal, mostly located in the cytoplasmic half of the membrane, and statistically significant (p ϭ 0.0002, which is the probability that the same number of randomly selected residues would cluster as tightly in three dimensions FIG. 1. Global trace residues identify a canonical signal transduction pathway with three functional subdomains: a ligand trigger region, an allosteric linking core and a G protein-coupling region. A shows the top 20% of class A determinants (C␣ atoms) mapped onto the rhodopsin structure (1HZX) with retinal depicted as a yellow stick model. B shows exclusively the subset that affects ligand binding (cyan spheres) on mutation, forming the trigger region. C shows in blue the residues that cause constitutive activity or folding/expression effects on mutation. They cluster to form an intermediate linking core involved in conformational activation linking the trigger region to a coupling region shown in D (magenta spheres) consisting of residues that affect G protein coupling/activation. Ile-75 and Leu-79, which had not previously been assigned any function, are depicted as yellow spheres.
To assess the functional importance of these 39 trace residues, we reviewed the published record of their mutations (Table II) and found 239 mutations (excluding structural studies) distributed among 37 trace residues. 211 (88%) of these mutations gave rise to functional differences, including at least one at each of the 37 trace positions. Residues mutated in 2 receptors affected both in 6 of 7 (86%) cases, those mutated in 3 receptors affected all three 7 out of 8 (87%) times and at least 2 in all 8 (100%) instances. Nineteen positions were mutated in 4 or more receptors. In all 19 cases function was disrupted in at least 3 different receptor types, and in 18 cases (95%) function was affected in 4 or more receptor types. This confirms the generic role of these residues in GPCR signaling.
To correlate the structural location of these trace residues with their function, we further sub-classified them according to their most frequently observed mutational effect and mapped this information onto the rhodopsin structure (see Fig. 1 and Supplementary Material). Fourteen residues, colored cyan in Fig. 1B, predominantly affect ligand binding (Table II) and segregate in the extracellular half of the cluster. Sixteen residues, colored blue in Fig. 1C, cause constitutive activity, folding, or expression defects, and these segregate roughly in the middle of the cluster (Table II). Seven residues, colored magenta in Fig. 1D, primarily affect G protein coupling and signaling, and those fill the cytoplasmic base of the transmembrane domain. Thus specific functional effects localize strikingly in three regions: a trigger region, composed of residues most immediately involved in ligand-induced conformational changes, a coupling region adjacent to the cytoplasmic loops where G proteins bind, and between them a linking core, composed of residues that stabilize TM interactions and folding. Two trace residues in this linking core, Ile-75 and Leu-79, have not been previously mutated, to our knowledge, and we mutated Leu-79 to test the functional importance of our prediction.
As a negative control, we also analyzed the mutational data on the bottom 20% of evolutionarily important positions (39 residues). 3 positions had no mutational data, 14 had mutations that had no effect in any receptors, and only 3 residues had mutations that affected a generic GPCR function, namely expression (1 case) and G protein interaction (2 cases). Mutations of the remaining 19 residues affected ligand interaction, which is a ligand-specific function. In terms of mutations, 60 out of 112 (54%) reported mutations that have no apparent effect on function as opposed to only 12% at trace residues (Table III and Supplementary Material). This greater than 4-fold difference is significant with a p value of less than 0.0005 on a 2 test. Of the 52 mutations at low ranked positions that have functional effects, 27 mutations (52%) are in the top 20% of family-specific residues identified by differential trace analysis (data not shown). Finally, although 27 of 37 (73%) top ranked trace residues affect function in at least 3 receptors, only 7 of 36 (19%) low ranked positions do so (see Supplementary Material). Thus top ranked trace residues affect generic functions, and they do so in multiple receptor families, whereas bottom ranked ones often have no effect, and when they do this is mostly limited to ligand interactions in one or two receptors.
Retinal Binding Site in Opsin Family-We asked next whether differential trace analysis (21) could highlight functional determinants unique to the visual rhodopsin family. We traced 129 rhodopsin sequences and subtracted from the resulting trace residues the global ET residues (Fig. 1) at the same percentile rank (Fig. 2). At the top 20th percentile rank, 17 opsin-specific trace residues were found, of which 11 cluster near the retinal binding pocket (p value Ͻ 0.001) (Table I and Supplementary Material). The other opsin-specific residues appear at the cytoplasmic edge or near it. A retrospective analysis of the literature, summarized in Table IV, shows 32 known opsin family mutations at 12 opsin-specific trace residues. Of these, 28 mutations (87%) in 12 trace residues are linked to spectral shifts, constitutive activity, night blindness, or retinitis pigmentosa. Interestingly, Trp-175 is the single apparent false positive. This is surprising given that it is conserved across all opsin sequences, yet past mutations to other aromatic residues caused no spectral shifts and no loss of G protein activation (46). We therefore focused on this amino acid, Trp-175, for further study.
Trace Residue Mutations in Rhodopsin-As a bona fide test of the ability of ET to guide function-altering mutations in GPCRs, we prepared mutants at one globally important residue Leu-79 (L79A and L79S) and at one opsin-specific residue Trp-175 (W175A, W175C, and W175H). All expressed at sufficient levels to allow isolation by immunoaffinity chromatography in detergent following reconstitution with 11-cis-retinal. The absorbance spectra in the rhodopsin form (with 11-cisretinal bound) were essentially identical to wild-type (data not shown). All displayed wild-type ability to activate transducin upon exposure to light (Fig. 3).
There were, however, functional differences between the mutant proteins and wild-type opsin. As shown in Fig. 4, substitutions at both positions (W175A, W175C, W175H, and L79A) led to constitutive activity, i.e. G protein activation in the absence of ligand. Unlike wild-type opsin (47), none of the mutant opsins was sufficiently stable for isolation in detergent without added 11-cis-retinal, except L79S, which was obtained in very low yield (data not shown).
To determine the ability of the mutants to regenerate the dark photoreceptive state following photoactivation, we used a laser flash to effect nearly instantaneous quantitative photoisomerization of bound 11-cis-retinal to the all-trans form in the presence of excess 11-cis-retinal and then measured the maximum extent of regeneration by absorbance at 500 nm (Fig. 4). Nearly 80% of wild-type rhodopsin was regenerated under these conditions, whereas the Leu-79 mutants ranged from 40% to 45%, and the Trp-175 mutants ranged from 8% to 14% maximal regeneration.
Thus both trace residues are functionally important. They contribute to keeping activity levels low in the absence of ligand. They also appear to have a role in protein stability as indicated by defects in regeneration of the functional dark state following photoactivation in detergent. Moreover, in this example, evolutionary trace rank is related to the severity of the defect. Mutations at Leu-79, which ranks in the 20th percentile for the larger set of Class A sequences but only in the 30th percentile for the opsin family, had a less severe effect on rhodopsin function than mutations at position Trp-175, which ranks in the 5th percentile of importance for the opsin family. These results confirm that, as with soluble protein families examined previously, ET can be used to direct mutagenesis experiments efficiently to residues of functional importance in transmembrane domains of GPCR and likely of other transmembrane proteins as well. In practice, it is interesting that we found two new sites at which mutations induce constitutive activity. Previously, despite very extensive mutagenesis of rhodopsin, only seven such positions had been identified. This result points to the ability of the evolutionary trace as a method for finding novel sites of interest even in previously intensively studied proteins and protein families. DISCUSSION The amino acid determinants of GPCR function should fall in two categories: those that mediate ligand binding, expected to be highly specific because of the chemical and structural diversity of ligands and sequence divergence among receptors, and those that mediate the subsequent conformational change and receptor activation of G proteins, expected to be similar across receptors because of the promiscuity of G protein coupling (48), the activity of chimeric constructs (18), and the evidence for similar conformational changes (20, 28 -30). Both parts of this hypothesis are validated by this study.
Global Trace Residues Reveal a Canonical GPCR Conformational Switch-First, a trace of four GPCR families with low sequence identity extracted common functional determinants. The structural clustering of these trace residues, and their contact network that is especially dense at the cytoplasmic end near the G protein-coupling interface, suggest that they participate in a canonical GPCR conformational switch. This notion is supported by existing mutations that show, first, that ET is highly specific, because mutations at every top ranked residue affect function in at least one receptor and only 12% of mutations at these residues have no effect, and second that trace residues are critical to many receptors, because mutations of 73% of them affect at least three receptors. In addition, correlation of functional effects of mutations with location revealed that the switch involves three functionally distinct but structurally connected sub-clusters: a trigger region (Fig. 1B), near the retinal binding pocket of rhodopsin, a coupling region (Fig. 1D) near the G protein-coupling site, and a linking core (Fig. 1C) between the first two.
The seven coupling region residues fill the cytoplasmic base of the transmembrane domain, presumably forming a platform for G protein interactions, and six residues border the key G protein coupling loops 2 and 3 (Fig. 1D). Mutations at Leu-72 and Val-254 decrease G protein activation in rhodopsin, vasoactive intestinal peptide receptor, and muscarinic receptors. Others in this region affect G protein coupling in rhodopsin, angiotensin-2, thyrotropin, and ␣,␤-adrenergic receptors (Table II). Residue Tyr-136 of the (E/D)RY motif is especially Mixed (LB, ME) 310 2 1 G protein effects a CA, constitutive activity; LB, ligand binding effects; ME, mechanistic effects observed in structural studies (does not count as functional effect); Exp, decreased expression; G protein, G-protein coupling decreased/completely uncoupled/decreased G-protein stimulation. important in triggering GDP exchange in G proteins (49,50). The generic trigger region extends into the transmembrane area nearly up to the retinal binding pocket in rhodopsin (Fig.  1B), suggesting an intimate role in ligand sensing, which is supported by the mutagenesis data (Table II). The few residues far from the retinal binding pocket that affect ligand binding are likely coupled allosterically.
Most trace residues, however, lie in the intermediate linking core (Fig. 1C) drawn heavily from residues forming key hydro-gen bond networks such as TM1-TM2-TM7 (critical role for Asn-55), TM2-TM3-TM4 (Asn-78), and TM6-TM7 (Met-257, Asn-302, and Pro-303) (50 -52). Past mutations at these positions cause constitutive activity, misfolding, or expression defects and demonstrate a key role of this residue in maintaining structural integrity and dynamics. This is consistent with this linking core being an allosteric pathway coupling ligand binding to G protein activation by means of a conformational change initiated at the trigger region and which extends to the  coupling region (Table II). There is an overlap between all three of these regions, and these roles are also supported by our findings that mutations at Leu-79 caused constitutive activity (L79A) and decreased stability (both L79A and L79S).
ET Residues Cluster at Functional Sites in Opsins-Through differential analysis we also identified rhodopsin-specific trace residues, and they were indeed directly related to light sensing. Most cluster around the retinal binding site (Fig. 2C), and many influence interactions with the chromophore (Table IV). For example, Lys-296 is linked to retinal via a Schiff base, and introduction of a negative charge at Gly-90 (G90D) provides an alternative counterion to the positive charge of the protonated Schiff base and displaces the salt bridge between Lys-296 and Glu-113. Mutants E113Q and G90D cause a blue shift in the retinal absorption spectrum and constitutive activation of transducin, whereas mutations at Leu-125, Cys-167, and Pro-171 cause retinitis pigmentosa, poor retinal binding, misfolding, and reduced expression. Rhodopsin-specific residues in the cytoplasmic half of rhodopsin appear to affect G protein activation/coupling and TM-TM interactions (Fig. 2C), as revealed by mutations at Gly-51, Thr-58, and Val-250 (refer to Supplementary Material). Note that a family-specific residue could be important in some other families as well but is distinguished by not being important in all families.
We found no mutational data for Met-207, Met-288, and Phe-294 (near retinal) or for Gly-156 and Val-230 (near the cytoplasmic side). Their proximity to the retinal or to other key residues suggests that they play a role in light sensing. For example, Phe-294 is within ϳ8 Å of retinal and adjacent to Lys-296, whereas Met-207 and Met-288 are also within 4 Å and 7.5 Å, respectively, of retinal.
Trp-175 is the one trace residue for which mutational data refute functional importance; mutation of this invariant residue to phenylalanine or tyrosine did not give rise to a spectral shift or a defect in light-triggered G protein activation (46). We hypothesized that the functional importance of this residue could be revealed by additional mutations or assays. Indeed, mutations W175A, W175C, and W175H all have severe functional effects in vitro, although all display normal absorbance spectra and light-triggered G protein activation. These results suggest that ET can not only be used to identify novel sites where mutations are likely to provide insight into functional importance (e.g. L79A), but also to suggest positions at which apparent lack of functional effect by mutations may need to be investigated in more detail (e.g. Trp-175).
Applied Phylogenomics-The present evolutionary study of GPCRs suggests some answers to the questions raised at the outset about the mechanisms for signal transduction and ligand sensing. First, distantly related Class A receptors share common determinants for ligand sensing, G protein coupling and allosteric relay/control of activation state. In our experience, whenever divergent members of a protein family share such extensive functional sites, their structures are also highly related. This mutual consistency of sequence, structure, and function during evolution is pervasive (24,25,31), and here it strongly suggests that GPCR transmembrane structures are similar at least in the vicinity of the G protein interface.
In practice, this study provides a systematic and rational guide for engineering GPCR mutations. First, as shown here in rhodopsin, differential ET opens the door for a strategy to identify and test likely ligand binding sites in other receptor families. Second, studies of the conformational switch targeted to the trigger, linking, and coupling regions may help control receptor sensitivity, constitutive activity, and coupling specificity, which is of broad pharmaceutical interest. More generally, ET analyses should similarly inform studies in other large families of membrane proteins, such as ion channels and ABC transporters. In summary, our results suggest that ET is a reliable and highly efficient method for Transducin activation by wild-type (WT) and mutant opsins is shown in the presence and absence of 11-cis-retinal and light. Transducin activity was assayed using membranes from transfected COS cells as previously described (39): OE, in the absence of retinal; q, in the presence of retinal; and E, time course for the reaction in the presence of retinal after exposure to light (h) for each mutant listed. Membrane amounts were selected to give the same light-dependent transducin activation kinetics; experiments with detergent-purified rhodopsins confirmed that the specific activities in the light were not measurably different for the mutants and wild-type (data not shown). For L79A (2X), twice the amount of membranes was assayed.
FIG. 4. Efficiencies of rhodopsin regeneration following photolysis. The 500-nm absorbance values of purified rhodopsins were measured in the dark. Subsequently, in the presence of a 2-fold molar excess of 11-cis-retinal, the samples were exposed to a 1-s flash of 514-nm laser light at intensity sufficient to effect complete photoisomerization. The increase in 500-nm absorbance due to regeneration was recorded over 35 min to determine the maximal percent regeneration for each protein. The values represent the averages for each mutant repeated three times, and error bars indicate the standard deviations.
identifying multiple functional sites and residues for further experimental study.