Residues and residue pairs of evolutionary importance differentially direct signaling bias of D2 dopamine receptors

The D2 dopamine receptor and the serotonin 5-hydroxytryptamine 2A receptor (5-HT2A) are closely-related G-protein–coupled receptors (GPCRs) from the class A bioamine subfamily. Despite structural similarity, they respond to distinct ligands through distinct downstream pathways, whose dysregulation is linked to depression, bipolar disorder, addiction, and psychosis. They are important drug targets, and it is important to understand how their bias toward G-protein versus β-arrestin signaling pathways is regulated. Previously, evolution-based computational approaches, difference Evolutionary Trace and Evolutionary Trace–Mutual information (ET-Mip), revealed residues and residue pairs that, when switched in the D2 receptor to the corresponding residues from 5-HT2A, altered ligand potency and G-protein activation efficiency. We have tested these residue swaps for their ability to trigger recruitment of β-arrestin2 in response to dopamine or serotonin. The results reveal that the selected residues modulate agonist potency, maximal efficacy, and constitutive activity of β-arrestin2 recruitment. Whereas dopamine potency for most variants was similar to that for WT and lower than for G-protein activation, potency in β-arrestin2 recruitment for N124H3.42 was more than 5-fold higher. T205M5.54 displayed high constitutive activity, enhanced dopamine potency, and enhanced efficacy in β-arrestin2 recruitment relative to WT, and L379F6.41 was virtually inactive. These striking differences from WT activity were largely reversed by a compensating mutation (T205M5.54/L379F6.41) at residues previously identified by ET-Mip as functionally coupled. The observation that the signs and relative magnitudes of the effects of mutations in several cases are at odds with their effects on G-protein activation suggests that they also modulate signaling bias.

G-protein-coupled receptors, the dominant mechanism by which we sense our external and internal environment, convey their signals through divergent downstream signaling path-ways, particularly G-protein activation and ␤-arrestin signaling. Of these, the ␤-arrestin pathway appears to be more complex and is less completely understood. Signaling events depend not only on agonist binding to the receptor and the resultant conformational changes, but also on agonist-dependent phosphorylation and multiple protein-protein and protein-lipid interactions occurring both at the cell surface and at the surface of intracellular endosomal vesicles (1)(2)(3)(4)(5).
Although both G-protein activation and ␤-arrestin signaling result from agonist binding, they do so on distinct time scales and, in at least some cases, in different locations and molecular contexts (6,7). Considerable evidence suggests that the conformational states of the receptors triggering these pathways are also different (8,9). Understanding the molecular mechanisms for these diverging pathways is important for understanding their roles in cellular physiology. It is also important for understanding the pharmacology of existing and yet-to-be discovered therapeutics, most of which act directly or indirectly on GPCR 2 pathways (10 -14). In this work we use the term "bias" to indicate the relative tendency for a ligand or a receptor to signal via one pathway versus another, particularly as compared with the responses of the WT receptor to its endogenous ligand, dopamine.
The D2 dopamine receptor is arguably the GPCR of most pervasive neuropharmacological importance, due to its role in the actions of virtually all psychotropic drugs and in human disorders ranging from substance abuse to Parkinson's disease (2,(15)(16)(17)(18)(19). Recent studies in vivo using either biased drugs selected for preferential activation of one pathway or the other downstream of D2R (20) or D2R variants engineered to trigger ␤-arrestin or G-protein pathways selectively (21), have confirmed that different behavioral effects of D2R-directed drugs act differentially through these two pathways.
Previously, we have used D2 dopamine receptors as a model system for testing the predictive value of evolution-based computational tools for identifying amino acid residues and pairs of residues important for specificity, potency, and efficacy of GPCR signaling; the results have confirmed their value in this regard (22,23). Note that we use the term "efficacy" here in its normal meaning (the "ability to effect a desired result") with the result of interest being the response in our assays, and not "efficacy" as used in describing drug responses in clinical pharmacology. In the work presented here, we have tested the roles of these computationally identified residues in ␤-arrestin engagement and activation. Comparison of their roles in G-protein versus ␤-arrestin signaling also serves to validate the evolutionary importance of fine-tuned signaling bias in GPCRs. Although there have been a number of studies on residues at the GPCR cytoplasmic surface regulating differential interactions with ␤-arrestins and G-proteins (24 -28), our focus here is on residues within the seven transmembrane helices that convey the allosteric signal from the ligand-binding site to the cytoplasmic surface (29 -31).
The computational techniques employed are the Evolutionary Trace (22,23,(32)(33)(34)(35)(36)(37), which ranks residues in terms of their evolutionary sensitivity to mutations, and a synthesis of the Evolutionary Trace with mutual information-based assessment of co-evolution and functional coupling termed ET-Mip (23), which ranks pairs of residues by their correlated impact on function. The experimental method employed for assessing ␤-arrestin signaling is the TANGO assay (38), routinely used to test the effects of compounds on this signaling pathway (39). The assay is exquisitely sensitive because of its highly-amplified nature, employing transcription factor release and chemiluminescent detection of luciferase expression. It reports exclusively on physical interactions between an engineered version of the receptor and an engineered version of ␤-arrestin2, so many of the subtleties of this complex pathway may be missing. However, it does provide a useful measure of relative recruitment efficiencies in a cellular context.

Residue selection
The difference ET algorithm was used to select residues differing between D2R and 5-HT2A within the transmembrane domain as described previously (22). Fourteen residues ranked as being of high evolutionary importance (top 40%) were chosen, but three of lower importance (bottom 20%), I105K 3.23 , A188N 5.37 , and I195F 5.44 , were also studied as negative controls. The amino acid residues chosen to replace the WT residues at the selected positions were those found in the 5-HT2A receptor, based on the hypothesis, largely borne out by experiments (22,23), that those substitutions are compatible with proper folding and stability but associated with distinct signaling properties. Two double mutations, I48T 1.46 /F110W 3.28 and T205M 5.54 /L379F 6.41 , were chosen to assess functional coupling based on analysis by the ET-Mip algorithm, which ranks pairs of residues according to their mutual information integrated into the ET framework as described previously (23). The resulting set of 19 mutants (Fig. 1), along with WT D2R, were tested in the ␤-arrestin2 recruitment assay. None of them are within the region thought to be the site of interactions with ␤-arrestin, and only two, F110W 3.28 and C118S 3.36 , are near the agonist-binding site.
Surface expression was measured for each construct as described under "Experimental procedures" and in the figure legends and routinely remeasured in conjunction with TANGO assays in order to correct for differences and to allow determination of relative efficacies. All the mutants studied display levels of surface expression comparable with those of WT, although the results varied from day to day and between plasmid preparations.

Evolutionary Trace residues alter ␤-arrestin2 responses to dopamine
To test whether any of the D2R mutations fine-tune the response to dopamine, we generated TANGO assay dose-response curves to detect changes in potency and efficacy (Figs. 2 and 3; Tables 1 and 2). The EC 50 value for dopamine acting on WT D2R in this assay, 56 nM, is similar to the value of 41 nM Figure 1. Summary of mutations used to test allosteric regulation of ␤-arrestin2 recruitment to D2R. A, serpentine plot of D2R from GPCRdb with the residues of interest chosen based on difference-ET highlighted in purple. B, ET residues are shown in the structure of D2R as spheres (C␣ atoms) and side chains, with hydrogens (PDB 6CM4, Wang et al. (43)). The spheres of the low ranked residues are in blue and the high ranked residues are in purple. The compound in cyan is the inverse agonist risperidone, located in the binding pocket. Superscripts represent the Ballesteros-Weinstein index for each residue.

Fine-tuning arrestin recruitment to D2R
reported previously based on a bioluminescence resonance energy transfer assay that measures ␤-arrestin2 recruitment more directly on a shorter time scale (36), suggesting the assay used here is a faithful reporter of ␤-arrestin2 recruitment. The WT EC 50 value is significantly higher than the values measured previously for activation of G␣ i (6.6 nM (36) or 9.7 nM (23)), consistent with the idea that G-protein activation and ␤-arrestin activation depend on distinct receptor conformations with distinct affinities for dopamine. The results for the mutant D2R constructs revealed altered responses relative to WT, including both increases and decreases in dopamine potency, both increases and decreases in efficacy, and increases in constitutive activity.

Mutations that have WT-like dopamine potency
Twelve of the point mutants tested, including the three negative control residues, I105K 3.23 , A188N 5.37 , and I195F 5.44 , had potencies similar to WT (Fig. 2, B and C; Tables 1 and 2). Of these mutants with WT-like potency, potency for V83L 2.53 was more than 5-fold higher in TANGO than in G␣ 16 assays (22) and that for T205M was more than 3-fold lower. Dopamine potency in G␣ i assays (23) was higher than in TANGO assays for WT and all mutants except for N124H 3.42 , for which the potency was more than 5-fold higher for TANGO, and Y199F 5.48 for which potency in TANGO was ϳ1.5-fold higher.

Mutations that have altered dopamine potency
Four of the point mutants had significantly reduced dopamine potencies for ␤-arrestin2 recruitment, F110W 3.28 , C118S 3.36 , F202L 5.51 , and N418S 7. 45 . In previous assays of dopamine potency in G-protein activation assays, F202L 5.51 had lower potency for G␣ i (23), whereas C118S 3.36 and N418S 7.45 had lower potency for G␣ 16 activation (22). The potency of dopamine for I48T 1.46 recruitment of ␤-arrestin2 was enhanced relative to that of WT but much lower than for G␣ i activation.

T205M 5.54 mutant has constitutive activity, enhanced dopamine potency, and enhanced efficacy in ␤-arrestin2 recruitment
The most strikingly altered dose response was observed for T205M 5.54 . This mutation confers ␤-arrestin2 recruitment activity in the absence of agonist that is even greater than the maximum agonist-stimulated activity of WT (Figs. 3, A and D, and 4), revealing a high level of constitutive activity.
The T205M 5.54 dopamine dose-response curve has a biphasic shape (Fig. 3) that could be explained by two (or possibly more) binding sites that enhance activity. Assuming that the higher affinity site corresponds to the orthosteric site and that the plateau of activity between 10 Ϫ7 and 10 Ϫ6 M dopamine corresponds to saturation of this site, the total activity increase stimulated by dopamine at the orthosteric site is higher than the total activity elicited by dopamine in WT (Fig. 3, A and C, and Table 1). The total activity at this plateau, including the constitutive activity, is close to three times that of WT and 65% higher than that of WT when the constitutive activity is subtracted, indicating a dramatic increase in the ability of this mutant receptor to recruit ␤-arrestin2, both with and without bound dopamine. The apparent potency of dopamine at the Error bars not shown are smaller than the symbols. D, dopamine potency data from multiple experiments (n ϭ 3-11 for mutants and 25 for WT). Points represent pEC 50 values obtained from sigmoidal curve fits of raw data from the TANGO dose-response for each day tested. For T205M 5.54 , which exhibited a biphasic dose-response, data were truncated at 10 Ϫ6 M for fitting to the sigmoidal curve. L379F 6.41 is not included because the TANGO response is similar to that of the negative control. The bars represent means Ϯ S.E. of the points shown. All mutants were compared with WT using an unpaired two-tailed t test; ****, p Ͻ 0.0001; *, p Ͻ 0.05. Detailed information about the sample sizes and statistics is provided in Table 1.

Fine-tuning arrestin recruitment to D2R
presumed orthosteric site (see results with a competitive ligand below for caveats) is also increased, with a pEC 50 value nearly 4-fold higher than that of WT ( Fig. 2 and Table 1). Increased dopamine potency and efficacy for this mutant were also observed for G␣ i activation; however, there was no sign of constitutive activity or a second activating site affecting G-protein activation (23).

Two mutants display reduced efficacy in ␤-arrestin2 recruitment by dopamine
The V83L 2.53 and L379F 6.41 mutations both reduced the maximal amplitudes of the responses over the concentration ranges tested. L379F 6.41 was essentially nonresponsive (Fig. 3, A and D). This mutation was previously found to be nonresponsive for G-protein activation as well (22,23), even though its ligand-binding affinity and specificity are very similar to those of WT. The potency of dopamine for L379F 6.41 could not be determined due to the very low or nonexistent maximal activity level, but that of V83L 2.53 was not significantly different from WT.

Quantification of bias
Although there is some danger of oversimplification in reducing multiple measurements of different components of signaling to single parameters, it can be useful for comparisons with the results from other studies to calculate them. We have employed three different measures (Fig. 4, A and B; see "Experimental procedures" for details). One, ␤ Emax/EC50 , is based on the relative ratios of E max and EC 50 . It is derived from the "equiactive" comparison (40) formulated for comparing ligands and modified for comparing mutants. The other two, ␤ Emax and ␤ pEC50 , parameterize the efficacies and potencies separately so they can be compared as points in a two-dimensional vector space. The results shown in Fig. 4, A and B, reveal diverse effects of the mutations, displaying enhanced bias toward either G-protein signaling or ␤-arrestin recruitment depending on both the site of mutation and the measure compared.

Paired mutations at coupled residue positions
Previously, the mutation pairs I48T 1.46 /F110W 3.28 and T205M 5.54 /L379F 6.41 had been shown to be co-varying in evolution and functionally coupled (23). Thr-205 5.54 /Leu-379 6.41 are only 4 Å apart, so presumably their interactions are direct. However, Ile-48 1.46 /Phe-110 3.28 are too distant to be in direct contact, implying allosteric interactions. Of these pairs, T205M 5.54 /L379F 6.41 is particularly interesting because the two individual mutations yielded the most dramatic phenotypes, Raw intensity values are shown, uncorrected for surface expression levels relative to WT, which were: logSE rel (Ϯ S.D.) ϭ Ϫ0.1 Ϯ 0.18 (T205M), 0.71 Ϯ 0.33 (L379F), and 0.56 Ϯ 0.47 (V83L). C, constitutive activity was determined from the bottom plateau of dose-response curve fits, except for L379F 6.41 where the response with no drug added was used, relative to WT, and normalized by mean surface expression. Points show constitutive activity values of biological replicates (i.e. single-day experiments; n ϭ 3-6 for mutants and 38 for WT), and bars represent means Ϯ S.E. D, dopamine efficacy. Points represent the maximal response relative to WT on individual days, based on the maximal values of the best-fit curves for the dose-response data, normalized for mean surface expression relative to WT. For the inactive mutant, L379F 6.41 , the response to 10 Ϫ5 M dopamine is plotted. The bars represent the mean values Ϯ S.E. over multiple biological replicates, handled and plotted as in C. In both C and D, for those values that appeared different at the p Ͻ 0.05 level by two-tailed unpaired t test, values were tested at the positive or negative extreme of the 95% confidence level for mean log(relative surface expression) and scored as significantly different (*) only if the differences were retained when re-normalized for those surface expression values. Detailed information about the sample sizes and statistics is provided in Table 1.

Fine-tuning arrestin recruitment to D2R
with a striking gain-of-function for ␤-arrestin2 recruitment in the case of T205M 5.54 and a complete loss of function in the case of L379F 6.41 . The results of the double mutations are shown in Fig. 4, C-H, and Table 1. The addition of the T205M 5.54 swap rescued the loss of function induced by L379F 6.41 , resulting in a receptor variant that has similar dopamine potency as WT and a similar efficacy. For F110W 3.28 and I48T 1.46 , the results for potency and efficacy of the single and double mutants were all within statistical uncertainty of WT values. These results contrast with observations for G␣ i activa-tion, in which F110W 3.28 displayed decreased efficacy and potency, whereas I48T 1.46 and the double mutant displayed greatly enhanced efficacy and potency. In that case, combining the mutations restored dopamine potency to a level intermediate between those of the individual mutants and closer to that of WT, but yielded efficacy resembling that of I48T 1.46 (23). For T205M 5.54 , the double mutation abolished its constitutive activity, and in both cases, the double mutations eliminated the ␤-arrestin2 recruitment activity elicited by serotonin in the single mutants I48T 1.46 and T205M 5.54 (Fig. 5B). The "radar" plots Table 1 Potency and efficacy of dopamine in ␤-arrestin2 recruitment to D2R mutants TANGO assay results for detecting ␤-arrestin2 recruitment to D2R variants are shown. The nonlinear regression curve-fitting analysis and statistics are described under "Experimental procedures." The E rel p values with asterisks indicate values are different from WT over the 95% confidence limit range for log(mean relative surface expression).

Fine-tuning arrestin recruitment to D2R
in Fig. 4, C-H, depict the multidimensionality of mutation effects on signaling behavior. For example, I48T 1.46 displays enhanced G-protein efficacy and potency, and it displays enhanced potency but slightly reduced efficacy for ␤-arrestin recruitment, without effect on its intrinsic dopamine-binding affinity. In contrast, T205M 5.54 signals more strongly in both pathways by all six measures plotted.

Serotonin responses
Although the residue positions for mutations were chosen according to evolutionary importance, the identities of the substitutions were chosen as the corresponding amino acid residues in the 5-HT2A receptor. Therefore, it was of interest to test whether any of the mutations altered serotonin responses, as some did with respect to G-protein activation (22,23). In contrast to G␣ 16 activation, in which serotonin elicits weak but readily measurable responses from WT D2R (22), ␤-arrestin2 recruitment by WT D2R was not detected below 10 M serotonin, and even at high concentrations it was very weak (Fig. 5). Only two mutants, I48T 1.46 (pEC 50 Ϯ S.E., 6.27 Ϯ 0.131, n ϭ 4) and T205M 5.54 (pEC 50 Ϯ S.E., 6.87 Ϯ 0.082, n ϭ 4), displayed substantial and saturable responses to serotonin. These mutants had previously shown the strongest serotonin-induced responses in G-protein activation (22,23). These results suggest that most of the residues tested here, although important for fine tuning the sensitivity of ␤-arrestin2 signaling to dop- A and B, bias factors computed from relative activities of D2R mutants in response to dopamine for ␤-arrestin (␤-arr) recruitment (TANGO assay, this study) and G␣ i activation (membrane potential assay, Sung et al. (23)). A, equiactive comparison (40) calculated from relative E max /EC 50 values for the two assays. B, single measure bias factors calculated separately for E max (x axis) and pEC 50 (y axis). Error bars were derived from the S.E. of the two measurements using error propagation. Positive and negative values indicate bias toward ␤-arrestin recruitment or G␣ i activation, respectively. Details of the bias factor calculations are provided under "Experimental procedures." C-H, radar plots (generated using R) for I48T 1.46 (C), F110W 2.38 (D), I48T 1.46 /F110W 2.38 (E), T205M 5.54 (F), L379F 6.41 (G), and T205M 5.54 /L379F 6.41 (H) represent multiassay data after dopamine stimulation. The intersections of the lines outlining the inner magenta-shaded region with the radial axes denote WT activity, and those outlining the green-shaded regions depict activities of mutants. Data shown are as follows: G␣ i pEC 50 ; G␣ i E rel ; ␤-arrestin2 (ARRB2) pEC 50 ; ARRB2 constitutive activity (CA); ARRB2 E rel , and dopamine (DA) binding pK i . The G-protein (G prot) and pK i data were obtained from Sung et al. (23) and Rodriguez et al. (22). All the radar plots share the same axes. The same maximum and minimum values were used for both ARRB2 and G-protein pEC 50 and E rel to allow for comparison of values. The pEC 50 axis origin is 6 and the outermost line corresponds to 7.8. The E rel axis origin is 0 and the outermost line is 3.9. The constitutive activity axes range from Ϫ0.03 to 1.44, and the pK i axes range from 3 to 5.7. All values are listed in Table 2.

Fine-tuning arrestin recruitment to D2R
amine, do not play an important role in the ligand specificity of those responses.

Responses to biased agonists and an antagonist
We measured responses (Fig. 6) of WT D2R and four of the more interesting mutants, V83L 2.53 , I48T 1.46 , L171P 4.61 , and T205M 5.54 to two drugs with previously reported signaling biases: UNC9994, a ␤-arrestinbiased agonist (20), and MLS1547, a G-protein-biased agonist (41). WT D2R responded to UNC9994 in the TANGO assay as reported previously (20), with an EC 50 value of 4.4 nM, in agreement with the previous report of 6.1 nM. In contrast, UNC9994 was more potent in stimulating ␤-arrestin2 recruitment for V83L 2.53 and I48T 1.46 , less potent for L171P 4.61 (an accurate EC 50 value could not be calculated due to lack of saturation), and more efficacious for T205M 5.54 . Surprisingly, MLS1547 elicited robust ␤-arrestin2 recruitment activity in WT (about half the maximum activity observed with dopamine) and even higher responses relative to dopamine for T205M 5.54 and I48T 1.46 . In contrast responses were much lower for V83L 2.53 and nearly undetectable for L171P 4.61 . We also checked whether an antagonist of dopamine, previously characterized as a partial agonist and widely used as an antipsychotic, quetiapine, could block the constitutive activity or dopamine responses of the constitutively-active T205M 5.54 . As reported previously (42), quetiapine antagonized the ␤-arrestin2 recruitment activity of WT D2R. Surprisingly, it not only did not block constitutive activity, but it acted as a fairly potent agonist for T205M 5.54 , and actually potentiated its response to dopamine (Fig. 7), again suggesting the possibility of more than one site of agonist action on this mutant. Taken together, the results with these drugs suggest that D2R variants could lead to dramatic differences in drug responses in patients with respect to ␤-arrestin2 recruitment.

Mutations at ET-identified residues result in perturbed ␤arrestin signaling
The experiments described here were carried out to test the hypothesis that residues of evolutionary importance, as scored by the Evolutionary Trace algorithm, are important contributors of ␤-arrestin2 signaling as well as G-protein signaling. The observation that seven of the 14 mutations tested at positions of high ET rank had significant impacts on constitutive activity, potency, efficacy, or specificity, while none of the three mutations tested at positions of low ET rank had significant effects, supports this hypothesis.
The results also support the hypothesis that the allosteric pathways mediating G-protein activation and ␤-arrestin2 recruitment are distinct, albeit overlapping at some points. For example, Y213I 5.62 leads to a 3-fold increase in dopamine potency for G␣ i -protein activation (23), but no change in dopamine potency and a slight (and nonsignificant) decrease in dopamine efficacy for ␤-arrestin2 recruitment. N124H 3.42 , Y199F 5.48 , and the double mutation I48T 1.46 /F110W 3.28 all result in substantial decreases in dopamine potency for G␣ iprotein activation, of 572-, 17-, and 6-fold, respectively, whereas the potencies for ␤-arrestin2 recruitment do not differ significantly from that of WT. Three mutations, I48T 1.46 , I105K 3.23 , and F110W 3.28 , all increase G-protein activation efficacy substantially (23), but yield no significant change in efficacy for ␤-arrestin2 recruitment. F202L 5.51 had reduced G-protein activation efficacy and potency (23), but only reduced potency for ␤-arrestin2 recruitment. Thus, particular mutations can have functional consequences that differ in sign or magnitude, depending on the downstream allosteric pathway. Of the 14 residues with high ET ranks whose swaps are studied here, only a subset had significant effects on ␤-arrestin2 recruitment, but all had detectable effects on at least one assay for ␤-arrestin2 recruitment, G-protein signaling, or endogenous ligand affinity/potency.

Structural basis of mutant phenotypes
The lowered dopamine potency of C118S 3.36 might be explained in terms of its adjacency to the agonist-binding pocket (Fig. 8). Cys-118 3.36 forms part of the risperidone-binding pocket in the D2R structure, and its mutation to alanine was found to decrease risperidone affinity significantly (43). The C118S 3.36 mutant also had reduced affinity for dopamine (22). Mutations have previously been identified in the third transmembrane helix of D2R that significantly reduces ␤-arrestin2-(D2R A135R 3.53 /M140D 3.58 ) or G-protein (D2R L125N 3.43 / Y133L 3.51 )-signaling pathways (36), and these two mutations

Fine-tuning arrestin recruitment to D2R
might be disrupting coupled interactions involving TM3, which is shifted outward toward the extracellular end in the structure of constitutively-active rhodopsin bound to arrestin, as compared with inactive rhodopsin (44).
The structure of constitutively-active rhodopsin bound to arrestin compared with that of the inactive form of rhodopsin (44) reveals major shifts in TM6, and it seems likely that the large effects of L379F 6.41 on efficacy are due to its interference with the TM6 conformational shift. It is intriguing that the N418S 7.45 mutation affects ligand potency yet is far from the ligand-binding site in the structure. The C-terminal end of TM7 interacts with arrestin in that structure, and N418S 7.45 is immediately adjacent to the break in TM7, suggesting that it might disrupt coupling between the agonist-binding site and the ␤-arrestin2binding interface, leading to reduced dopamine potency through allosteric coupling. TM5 in the rhodopsin-arrestin complex is shifted outward toward the extracellular side and lengthens at the intracellular side, compared with inactive rhodopsin, and three mutations with significant alterations in dopamine potency or efficacy for ␤-arrestin2 recruitment are found in this helix, including the T205M 5.54 mutation which leads to dramatic gain-of-function.

Fine-tuning arrestin recruitment to D2R
minimum side-chain-to-side-chain distance without hydrogens, or 2.3 Å with hydrogens present, and pointing toward one another in the D2R structure. It is not surprising that replacing the polar threonine with the nonpolar methionine and leucine with the more bulky phenylalanine in this inter-helical space would be disruptive, but it is hard to understand how making both mutations simultaneously overcomes the disruption and restores near-WT function. Similar results were observed with G␣ i activation, in which T205M 5.54 paired with either L379F 6.41 or N124H 3.42 resulted in functional coupling to restore WTlike function (23). I48T 1.46 , which confers serotonin responsiveness in ␤-arres-tin2 recruitment, is roughly in the middle of TM1, which undergoes an outward shift toward the cytoplasmic end in the rhodopsin-arrestin complex. Its effects might involve proximal coupling to the agonist-binding site. It was previously reported that the mutation I48C 1.46 (going from the nonpolar isoleucine to the polar cysteine) increased the K d value for N-methylspiperone by about 2-fold, and it increased affinity for sulpiride by about the same factor, consistent with our observations of the effects of this site on ligand specificity (45). When I48T 1.46 is mutated in the presence of F110W 3.28 , the serotonin response is lost, possibly due to compensatory effects of Phe-110 3.28 at the ligand-binding pocket. Phe-110 3.28 corresponds to Leu-111 3.28 in the D4 receptor, which has high affinity for the agonist nemonapride, and to Phe-110 3.28 in D3, which, like D2, does not (46). A mutation of the corresponding residue in rhodopsin, E113Q 3.28 , along with the M257Y 6.40 mutation (at a position immediately adjacent to our Leu-379 6.41 site), yields a constitutively active form that was used to obtain a structure of the rhodopsin-arrestin complex (44). Because threonine is at that position in 5-HT2A, it makes sense that this substitution would increase serotonin potency in D2R.

Implications for therapeutic design
Drugs that treat mood disorders and target D2R and/or 5-HT2A tend to lack effector pathway selectivity resulting in significant side effects (47-49). Our findings should help to understand the structural mechanisms in D2R that allosterically regulate effector pathway activation and signaling bias for use in rational drug design. The observation that sites wellremoved from both the ligand-binding site and the cytoplasmic surface where effectors interact suggests that there are opportunities for development of biased allosteric modulators as well as for biased agonists and antagonists. Our data suggest that ET residues affect signal modulation between the ␤-arrestin2 and G␣ i -protein pathways, and one plausible explanation could be due to signaling bias.

Cell culture
The HTLA cell line (a HEK293 cell line stably expressing a tetracycline-controlled transactivator (tTA)-dependent luciferase reporter and a ␤-arrestin2-TEV fusion gene) was obtained as a generous gift from the Bryan Roth lab and cultured in 10-cm plates as described (39).

D2R cloning and transfections
D2R cDNA templates (Missouri S&T cDNA Resource Center, catalog no. DRD020TN00), previously mutated and published (22,23), were PCR-amplified utilizing KOD Hot Start DNA Polymerase and primers containing flanking ClaI restriction sites to generate D2R mutants in the TANGO backbone (Addgene catalog no. 66269) (38,39). The resulting plasmids express a protein containing an N-terminal HA signal sequence, followed by a FLAG tag, and then D2R; at its C terminus, D2R is fused to the V2 vasopressin receptor tail, a TEV cleavage site, and the tTA transcription factor. Each D2R vari-  (43)). The WT residues are depicted in purple, and the mutagenized residues are depicted in green. The residues depicted both before and after mutagenesis using Chimera are I48T 1

Fine-tuning arrestin recruitment to D2R
ant contained one or two difference ET-identified 5-HT2A residue swaps. The plasmid was expressed in HTLA cells at ϳ80% confluency by transfecting 2 g of DNA with 10 l of Lipofectamine 2000 in a final volume of 500 l of Opti-MEM (Thermo Fisher Scientific) per well in a 6-well plate where the cells were grown in DMEM with 10% (v/v) FBS. The cells from the 6-well plates were later collected and re-plated in 96-well plates as explained below under "TANGO and cell-ELISA."

TANGO and cell-ELISA
We used the transcriptional activation following arrestin translocation (TANGO) assay to measure recruitment of ␤-ar-restin2 to D2R (39). Upon recruitment of the ␤-arrestin2-TEV protease fusion protein, the D2R fusion will be cleaved at the TEV cleavage site, releasing tTA and promoting luciferase expression. HTLA cells transfected with TANGO constructs were trypsinized at 12-15 h after transfection in 6-well plates, resuspended in 10 ml per well of DMEM containing 1% (v/v) dialyzed FBS (Omega Scientific) with 50 units/ml penicillin and 50 g/ml streptomycin (Lonza), and replated at 100 l per well in a 96-well plate. Cells were allowed to adhere for 7 h and then 50 l of 3ϫ drug solution or vehicle control was added. Drugs used were dopamine-HCl (Sigma catalog no. H8502), UNC9994 (Axon Medchem catalog no. 2562), MLS1547 (Sigma catalog no. SML1331), and quetiapine hemifumarate salt (Sigma catalog no. Q3638). Dopamine was dissolved in H 2 O as 10 and 100 mM stock solutions, and all other ligands were dissolved in DMSO as 10 mM stock solutions. Final drug concentrations were 10 Ϫ9 , 10 Ϫ8 , 10 Ϫ7 , 10 Ϫ6.5 , 10 Ϫ6 , and 10 Ϫ5 M, and in some cases 10 Ϫ4.8 M, and treatments were performed in triplicate wells. The plates were removed from the 37°C incubator 18 -20 h after adding drug; wells were washed once with 200 l of PBS and then 100 l of BrightGlo (Promega) containing the luciferase substrate, and cell lysis reagent was added to each well. The plate was incubated for 20 min prior to starting the reading, and then luminescence was measured at 570 nm using a FlexStation III (Molecular Devices) at a standard read time per plate of ϳ2 min. Surface expression for comparison of dopamine and serotonin responses was determined using a cell-ELISA protocol, as described previously (23), except that the primary antibody was rabbit anti-DYKDDDDK (Cell Signaling, catalog no. 2368), added to a final concentration of 0.257 g/ml. Surface expression for the responses to UNC9994, MLS1547, and quetiapine was determined using a cell-ELISA protocol adapted from Refs. 23, 50. The primary antibody was mouse monoclonal anti-DYKDDDDK (Sigma, catalog no. A8592), added to a final concentration of 3.333 g/ml.

Data analysis
For potency determinations, raw TANGO data from each experiment, including internal replicates, were fit to a singlesite saturation dose-response curve: T(C) ϭ T 0 ϩ (T max Ϫ T 0 )C/(C ϩ EC 50 ), where C is concentration of dopamine or serotonin; T is the TANGO chemiluminescence signal; and T max , T 0 , and EC 50 are fit parameters. For the case of T205M 5.54 , which displayed evidence for a second site, a similar two-site binding curve, with two additional fit parameters (T max, 2 and EC 50, 2 ) was also tried, and for the single-site fit, only data up to the first plateau (10 Ϫ6 M) were included. Wells with no drug treatment were included in the fits and plots by assigning them a nominal concentration of 10 Ϫ13 M. To facilitate visual comparison of mutants in Fig. 2, A-C, data were rescaled to the top and bottom plateaus of the curve fits. EC 50 and pEC 50 (Ϫlog 10 (EC 50 )) values for each mutant and WT were averaged across three or more separate experiments on different days as reported in Table 1.
For quantification of efficacy and constitutive activity, TANGO dose-response curves were normalized to internal controls from the same plate as follows. Internal replicates were averaged, and for each drug concentration the values from corresponding wells transfected with pCDNA3.1 empty vector were subtracted. Resulting data for WT D2R were then fit with a sigmoidal dose-response curve (as above), and the span between the baseline and the top plateau of that fit (T max ) was compared with the fits for mutants on the same plate to yield E 0 , the relative efficacy without consideration of relative surface expression, E 0 ϭ T max, mut /T max, WT , and for constitutive activity, CA, CA 0 ϭ T 0, mut /T max, WT . Cell-ELISA values for surface expression (SE) were normalized to internal controls from the same plate in a similar manner: SE rel ϭ (SE mut Ϫ SE 0 )/(SE WT Ϫ SE 0 ), where SE is the cell-ELISA luminescence signal, and SE 0 is the value for the cells transfected with the empty vector control.
The SE rel values from at least three independent experiments carried out using the same plasmid constructs were averaged and used to determine the final relative efficacy and constitutive activity values corrected for differences in surface expression: E rel ϭ E 0 /SE rel , and CA rel ϭ CA 0 /SE rel . These values are plotted in Fig. 3, C and D, and reported in Table 1. The sample standard deviations from averaging the E rel values across three or more independent experiments on separate days were used to calculate the standard error of the mean, which is reported as the uncertainty in Table 1. In Fig. 3, C and D, the error bars represent the standard error of the mean of all the individual experiments, and the value for each individual experiment is plotted.
For L379F 6.41 , which is largely unresponsive to dopamine, reliable values of EC 50 and T max could not be obtained, so relative efficacy was calculated for each day's experiments using the values measured at 10 Ϫ5 M dopamine instead of T max .
For testing the biased agonists UNC9994 and MLS1547, TANGO assays and cell-surface assays were carried out and analyzed as described above. For quantifying relative efficacy, the span of the best-fit curve for the TANGO signal was normalized to the span for dopamine responses carried out in parallel.
All curve fitting was carried out using the Levenberg-Marquardt algorithm for nonlinear least-squares regression as implemented in GraphPad Prism version 8. Values of pEC 50 and E rel for each variant were tested for differences from WT values using a two-tailed unpaired t test; the underlying assumption is that these values reflect intrinsic properties of each variant and that the variance results from measurement uncertainties. The null hypothesis tested is that the properties for each variant are indistinguishable from those of WT

Fine-tuning arrestin recruitment to D2R
within the measurement uncertainties, and the alternative hypothesis is that they are measurably different. For E rel comparisons, each value was normalized by its mean level of surface expression as compared with WT before comparison, and for those that appeared different at the p Ͻ 0.05 level, values were tested at the positive or negative extreme of the 95% confidence level for mean log(relative surface expression) and scored as significantly different only if the differences were retained when re-normalized for those surface expression values.

Bias parameters
Bias factors were calculated using the equi-active comparison method (40) adapted to compare mutants rather than ligands as shown in Equation 1, where P1 is ␤-arrestin recruitment; P2 is G␣ i activation, and mut is the mutant of interest. To examine bias separately for different measured quantities, the following bias factors were used as shown in Equations 2 and 3,

Radar plots
Web representations (51) were used to compare the effect each mutation has in terms of preference for the G␣ i effector pathway (22,23) versus the ␤-arrestin2 effector pathway. Plots were generated using the package fmsb in R.

Molecular graphics
These were generated from the referenced PDB files using Chimera version 1.11.2 as seen in Figs. 1 and 6.  Fine-tuning arrestin recruitment to D2R