A Crystallographic Study of Bright Far-Red Fluorescent Protein mKate Reveals pH-induced cis-trans Isomerization of the Chromophore*

The far-red fluorescent protein mKate (λex, 588 nm; λem, 635 nm; chromophore-forming triad Met63-Tyr64-Gly65), originating from wild-type red fluorescent progenitor eqFP578 (sea anemone Entacmaea quadricolor), is monomeric and characterized by the pronounced pH dependence of fluorescence, relatively high brightness, and high photostability. The protein has been crystallized at a pH ranging from 2 to 9 in three space groups, and four structures have been determined by x-ray crystallography at the resolution of 1.75–2.6Å. The pH-dependent fluorescence of mKate has been shown to be due to reversible cis-trans isomerization of the chromophore phenolic ring. In the non-fluorescent state at pH 2.0, the chromophore of mKate is in the trans-isomeric form. The weakly fluorescent state of the protein at pH 4.2 is characterized by a mixture of trans and cis isomers. The chromophore in a highly fluorescent state at pH 7.0/9.0 adopts the cis form. Three key residues, Ser143, Leu174, and Arg197 residing in the vicinity of the chromophore, have been identified as being primarily responsible for the far-red shift in the spectra. A group of residues consisting of Val93, Arg122, Glu155, Arg157, Asp159, His169, Ile171, Asn173, Val192, Tyr194, and Val216, are most likely responsible for the observed monomeric state of the protein in solution.

Green fluorescent proteins (GFP) 2 and GFP-like proteins (FP) have become important noninvasive tools for visualization and monitoring of the internal processes within cells or whole organisms, such as gene expression, monitoring the cellular pH, ion concentration, embryogenesis, inflammatory processes, tracking protein trafficking, the migration of parasites within a host, etc (1)(2)(3)(4)(5)(6)(7)(8)(9)(10)(11)(12)(13). Fluorescent proteins can be used to visualize many types of cancer processes, including primary tumor growth, tumor cell motility and invasion, metastatic seeding and colonization, angiogenesis, and interactions between the tumor and its host microenvironment (14 -16). FPs might be very useful in real-time testing of the efficacy of cancer drugs in animal models of human cancer.
The extensive spectral diversity of fluorescent proteins arises mostly from variations in the chemical structure of the mature chromophore and in the stereochemistry of its adjacent environment. The FP chromophore forms autocatalytically in vivo and in vitro from three residues, Xxx-Tyr-Gly, without need for any cofactors or enzymes, except for molecular oxygen (17). In most cases, the post-translational modification results in a blue/green emitting state, characterized by formation of an imidazolinone heterocycle with a p-hydroxybenzylidene substituent. Often, the reaction chain propagates further with formation of an additional N-acylimine double bond, which extends the conjugation of the chromophore electronic system and results in a bathochromic shift in spectra (18 -22).
Proteins that emit red, and especially far-red light, are of particular interest (13). The longer wavelength light extends the range of fluorescence resonance energy transfer (FRET)based applications and causes fewer damaging events to proteins and DNA because of its lower energy. The most favorable "optical window" for the visualization in living tissues is ϳ650 -1100 nm (23). Light with wavelength longer than 1100 nm is absorbed by water. Detection of fluorescence from proteins with emission peaks much shorter than 650 nm encounters the problem of interfering cellular autofluorescence. At present the brightest red fluorescent proteins have emission maxima too far from the preferred "optical window." Besides, their excitation maxima are located in a range 550 -560 nm, where living tissues are almost opaque and fluorescence of these proteins cannot be effectively excited (see Table 1 in Ref. 13). Recently, far-red fluorescent variants, HcRed, mPlum, and AQ143, reaching the 650 nm barrier, have been developed (24 -26).
However, these proteins are characterized by low brightness, strongly limiting their practical application.
A majority of the wild-type GFP-like proteins form tetramers, complicating their practical use. To be the most useful tools for practical applications, the designed biomarkers should preferably exist in a monomeric form, emit in the far-red fluorescent spectral range, have high brightness, be photostable, and exhibit a high rate of chromophore maturation. It has always been difficult to develop variants meeting all of these criteria simultaneously. Recently, however, the bright far-red dimeric variant Katushka and its monomeric version mKate, both characterized by a chromophore-forming sequence, Met 63 -Tyr 64 -Gly 65 , have been successfully designed (13). Both proteins, characterized by similar spectral properties ( ex , 588 nm; em , 635 nm), are derived from wild-type red fluorescent progenitor eqFP578 (Entacmaea quadricolor), the latter having spectral maxima ex , 552 nm; em , 578 nm. Both Katushka and mKate are significantly brighter than the spectrally close HcRed or mPlum (24,25) and display fast maturation, as well as high pHand photo-stability. The fluorescence of Katushka and mKate is pH-dependent, showing maximum emission at pH ϳ8, which gradually diminishes to zero at pH ϳ4. Compared with other far-red FPs, Katushka exhibits evident superiority for visualization in living tissues. The monomeric mKate is an excellent fluorescent label for monitoring fused proteins in whole organisms, multicolor labeling, and FRET applications.
We present here the results of crystallographic studies of three-dimensional structures of the far-red variant mKate in different fluorescent states, corresponding to pH 2.0, 4.2, and 7.0. These structures were solved at resolution ranging from 1.75 to 2.6 Å. We have analyzed in detail the stereochemical features in the chromophore area that are responsible for the outstanding spectral characteristics of the protein and its relatively high pH dependence of fluorescence, as well as the nature of surface residues responsible for the predominantly monomeric state of the protein.

EXPERIMENTAL PROCEDURES
The details of cloning, purification, and characterization of the studied proteins are presented in supplemental "Experimental Procedures." Crystallization, Structure Solution, and Crystallographic Refinement-Crystals have been obtained by the hanging drop vapor diffusion method in four different conditions. The mKate_pH2.0 crystals have appeared from 20% w/v PEG 3350, 0.2 M ammonium citrate tribasic, 0.4 M citric acid, pH 2.0, initial protein concentration 12 mg/ml. The mKate_pH4.2 was crystallized from 17.5% (w/v) PEG 3350, 0.07 M citric acid, pH 4.2, initial protein concentration 15 mg/ml. Crystals of mKate_pH7.0 and mKate_pH9.0 were grown from 20% w/v PEG6000, 1 M LiCl, 0.1 M HEPES pH 7.0, and 20% w/v PEG6000, 1 M LiCl, 0.1 M Bicine pH 9.0, respectively. Initial protein concentration for both conditions was 18 mg/ml. X-ray diffraction data were collected from single crystals flash-cooled in a 100 K nitrogen stream. Prior to cooling, the crystals were transferred to a cryo-protecting solution containing 20% glycerol and 80% reservoir solution. Data were collected with a MAR300 CCD detector at the SER-CAT beamline 22ID (Advanced Photon Source, Argonne National Laboratory, Argonne, IL) and were processed with HKL2000 (27).
Crystal structure of mKate at pH 4.2 was solved by the molecular replacement method with MOLREP (28,29), using the coordinates of the eqFP611 monomer without the chromophore (71% sequence identity, PDB ID: 1UIS; (19)). The refined coordinates of mKate at pH 4.2 were used to solve the other mKate structures at pH 2.0, 7.0, and 9.0. Structure refinement was performed with REFMAC5 (30) and PHENIX (31), alternating with manual revision of the model using COOT (32). Water molecules were located with ARP/wARP (33). Noncrystallographic symmetry restraints were applied in refinement of mKate_pH7.0 and mKate_pH9.0 structures with eight subunits in asymmetric unit. The occupancy of each chromophore state was set to reach the best possible agreement between the model and difference electron density map. Crystallographic data and refinement statistics are presented in Table 1. Although the values of R merge were relatively high in the outermost shells of all data sets, the corresponding values of I/(I) indicated that these data were still significant.
Structure validation was performed with PROCHECK (34). The coordinates and structure factors were deposited in the Protein Data Bank under accession codes 3BX9 (mKate_pH2.0), 3BXA (mKate_pH4.2), 3BXB (mKate_pH7.0). The structure at pH 9.0 (accession code 3BXC) was found almost identical to the structure at pH 7.0 and was skipped from discussion.

RESULTS AND DISCUSSION
Electron Density Interpretation-The asymmetric unit in mKate_pH2.0 and mKate_pH4.2 crystals contains one dimer. Crystallographic symmetry operations transform the dimers to the corresponding tetramers. The asymmetric unit in mKate_pH7.0 possesses two tetramers. The electron density for all structures allowed unambiguous fitting of residues 2/3/ 4 -228 for all monomers in the asymmetric unit. No density was observed for the N-terminal His tag fragment introduced into the expressed construct and used for protein purification. The relatively high resolution ϳ1.8 Å of the low pH structures enabled us to detect alternative stable conformations for a number of side chains. We located between 259 and 351 hydrogen-bonded water molecules in the asymmetric unit of each crystal. Several citric acid and glycerol molecules (the components of the crystallization and cryoprotectant solutions) were located in the mKate_pH2.0 and mKate_pH4.2 structures.
Monomer Structure-The principal structural fold of the mKate is an 11-stranded ␤-barrel, closed from both sides by loop caps, with a chromophore (matured from the sequence Met 63 -Tyr 64 -Gly 65 ) embedded in the middle of an internal ␣-helix that is wound along the ␤-barrel axis. The C-terminal tail 222-228 has irregular conformation and goes away from the ␤-barrel body. The R.M.S.D. values from pairwise superposition of the mKate monomer structures corresponding to different pH are within the range of 0.32-0.41 Å for all equivalent C ␣ atoms, indicating a very similar fold of the monomers. Two cis peptide bonds preceding Pro 50 and Pro 85 in the loop area have been detected. Similarly to TurboGFP and the FPs from Zoanthus (22,35), the ␤-barrel frame of mKate shows the presence of a pore, formed by the backbone of Trp 140 , Glu1 41 , Ala1 42 , Arg 197 , Arg 198 , and Leu 199 , leading to the hydroxyphenyl moiety of the chromophore. A chain of hydrogen-bonded water molecules, going through the pore from the outside, could be identified in the mKate_pH2.0 and mKate_pH4.2 structures. Evdokimov et al. (35) suggested that this pore is essential for chromophore maturation, providing access for molecular oxygen.
Monomer Association-According to gel filtration data, mKate exists in solution in the monomeric state at concentration as high as 10 mg/ml (13). However, in the crystalline state, which corresponds to a much higher protein concentration, mKate adopts at all pH values tetrameric arrangement with 222 symmetry, typically seen in GFP-like proteins. The interacting surfaces of the subunits create two types of interfaces. Interface IF1 is located between two antiparallel monomers that form an "antiparallel" dimer (A-dimer), whereas the IF2 interface is found between two monomers belonging to adjacent A-dimers. Those monomers positioned at ϳ75°with respect to each other form a "crossed" dimer (C-dimer) ( Table 2). The irregular C-terminal tail, consisting of residues 222-228, goes away from the ␤-barrel and sticks to cylindrical surface of the interacting counterpart, contributing to the IF2-contacting surface.
The tetrameric assemblies of mKate in the crystal forms grown at different pH have similar topology but exhibit significant packing differences. The pairwise three-dimensional superposition of the tetramers for all equivalent C ␣ atoms gives the following values of R.M.S.D. for corresponding pairs: mKate_pH7.0 and mKate_pH2.0; 1.85 Å, mKate_pH4.2 and mKate_pH2.0; 2.61 Å, mKate_pH4.2 and mKate_pH7.0; 4.30 Å.
The IF1 interface within the A-dimer is noticeably weaker, compared with the IF2 interface within the C-dimer. In the structures reported here, IF1 exhibits significant variation in its contact area, the number and composition of stabilizing interactions, and the angle between the antiparallel ␤-barrel axes ( Table 2). In all crystal forms, the IF2 interfaces are more extensive and more uniform than the IF1 interfaces.
Although the monomeric mKate and the dimeric Katushka have similar spectral characteristics ( ex , 588 nm, em , 635 nm; (13)), a comparison of their primary structures shows 18 differences (Fig. 1). Three of them, corresponding to positions 93, 122, and 155 are situated at the interface IF1, whereas eight differences at positions 157, 159,169,171,173,192,194, and 216 are found at the IF2 interface. At least some of these differences must be responsible for the observed variation of the oligomeric states of these proteins in solu-  IF1  IF2  IF1  IF2  IF1  IF2  690  1500  300  1490  930  1490  Interface-

The Structure of Bright Far-Red Fluorescent Protein mKate
tion. These 11 interface positions (highlighted in green for mKate in Fig. 1) are occupied by identical amino acids in Katushka and its wild-type progenitor eqFP578, both of which form dimers in solution (13,36). pH-induced cis-trans Isomerization of the Chromophore-The spectral properties of the far-red fluorescent protein mKate have been comprehensively investigated by Shcherbo et al. (13). The protein exhibits maximum emission at pH ϳ8, gradually disappearing at pH ϳ4 (Fig. 2a). In mKate the posttranslational modification of the chromophore-forming sequence Met 63 -Tyr 64 -Gly 65 results in a conventional GFP two ring-conjugated core consisting of a five-membered imidazolinone heterocycle with a p-hydroxybenzylidene substituent. Similar to other red and far-red fluorescent proteins (18 -20, 37, 38), the first chromophore residue Met 63 in mKate is characterized by formation of an N-acylimine partially double bond, NϭC ␣ , the sp 2 hybridization of the corresponding C ␣ atom, and the cis configuration of the preceding peptide bond. An additional N-acylimine bond apparently extends the chromophore-conjugated electronic system, resulting in a bathochromic shift in spectra.
The unique feature of mKate, revealed by this study, is the observed pH-induced cis-trans isomerization of the chromophore Tyr 64 phenolic ring with respect to the C ␣ -N bond. The predominant trans conformation of the phenolic ring (Fig.  3A) was detected in the mKate_pH2.0 crystal structure that corresponds to the non-fluorescent (dark) state (Fig. 2). The difference electron density indicates the presence of ϳ10% of the cis isomer. In contrast, the highly fluorescent (bright) state of the mKate_pH7.0 structure is characterized by a mostly cis conformation of the phenolic ring, with ϳ10% contamination by trans isomer in four out of eight independent subunits (Fig.  3C). The structure of mKate_pH4.2, exhibiting a low level of fluorescence, shows the presence of both the trans and cis iso-mers in a ratio ϳ60% to ϳ40% in subunit A (Fig. 3B) and ϳ80% to ϳ20% in subunit B.
Both the trans and cis forms of the chromophore, representing the dark and the bright states, respectively, exhibit noticeable distortion from coplanarity of the imidazolinone and phenolic rings. In two subunits of the mKate_pH2.0 crystal structure with the trans chromophore, the values of 1 and 2 torsion angles around the C ␣ ϭC ␤ and C ␤ -C ␥ bonds of the tyrosine are relatively low, ϳ173°(7°deviation from the ideal planar form) and ϳ15°respectively. In eight subunits of the mKate_pH7.0 structure with a cis chromophore, these angles are ϳ1°and 25 Ϯ 5°, respectively. In FPs, both chromophore rings generally are more coplanar in the cis than in the trans arrangement (Table 3B in Ref. 39). Non-coplanar ring arrangement was mostly associated with the non-fluorescent state. However, the observed coplanarity of the trans chromophore in eqFP611 (19) and the non-coplanarity of the cis chromophore in the fluorescent variant of Rtms5 (40) show that this is not always the case. In mKate, the non-coplanar cis chromophore (with a relatively large value of 2 ), exhibits high fluorescence. We suggest that the energy difference between nonplanar and planar chromophore conformations is small and could be overcome by light excitation. In other words, the nonplanar conformation of the cis chromophore observed in the crystals presumably corresponds to the resting state, which at small energy expense may be transformed to planar conformation in the excited fluorescent state. The bond angle C ␣ -C ␤ -C ␥ of Tyr 64 FIGURE 1. Sequence alignment of the wild-type eqFP578 and its mutant far-red variants, Katushka and mKate. Residue positions in three-dimensional structure: $, chromophore forming residues. Black and red font (highlighted in yellow), chromophore nearest environment (see Figs. 4 and 5). Red, chromophore environment; apparently responsible for the far-red shift. White font (highlighted in black), ␤-barrel caps area; presumably silent mutations or covariant with those in red. White font (highlighted in green), crystal intratetramer interfaces; apparently responsible for monomerization in solution. White font (highlighted in blue), presumably silent mutations in random mutagenesis process.  OCTOBER 24, 2008 • VOLUME 283 • NUMBER 43 observed in the trans mKate chromophore is 5°larger than that in the cis chromophore (ϳ135°versus ϳ130°). This difference, as well as the deviation from planarity of the trans chromophore, apparently arises from the steric repulsion between C ␦ and carbonyl O atoms of Tyr 64 .

The Structure of Bright Far-Red Fluorescent Protein mKate
Different geometric restraint schemes were tested to determine the optimal geometry of the group (63)C ␣ ϭN-C(O)-C ␣ (62) bridging the C ␣ atoms of the first chromophore residue Met 63 and the preceding Phe 62 . Similar to HcRed (38), it exhibits, at optimal fit to electron density, considerable deviation from planarity with torsion angle around the quasi-peptide N-C(O) bond in a range 20 -35°. Moreover, similarly to other red and far-red fluorescent proteins (18,19,22,38), the C(O)-N-C ␣ bond angle of the linkage in both chromophore isomers is strongly linearized, in the range of 140 -160°. The nature of such unusual geometry of the linkage is not clear. It may be assumed that steric tension in the central ␣-helix that, according to a hypothesis of Barondeau et al. (17), drives chromophore formation, might leave in the mature structure some remnant strain, partially responsible for the observed effect. The equilibrium strain relaxation is presumably achieved at the expense of the compromised distortion of the linkage preceding the chromophore.
The chromophore in the trans and cis conformations makes three direct H-bonds (Յ3.3 Å) with the side chains of the protein, three H-bonds with water molecules (each water molecule mediates H-bonding with two residues), and the respective 106 and 93 van der Waals contacts (Յ3.9 Å) (Fig. 4). Two of the three direct H-bonds are formed by the carbonyls of Tyr 64 and Gly 65 of the chromophore, interacting with the side chains of Arg 92 and Trp 90 , respectively. The third H-bond is formed between the hydroxyl of Tyr 64 and the side chains of either Ser 158 or Ser 143 in the trans or cis conformational states, respectively.
The consensus part of the chromophore nearest shell in the mKate_pH2.0 and mKate_pH7.0 structures is composed of 17 residues, most of which are involved in an extensive H-bond network formed by the side chain and backbone interactions (Fig. 5). Among them are the catalytic residues, Glu 215 , Arg 92 , and Thr 60 . Five proximal waters (presumed reaction products of the maturation process) are actively involved in forming the networkmediating residue interactions. The hydrogen-bonded network interacts with the chromophore and is apparently functionally important, creating a potential proton wire in the maturation process. Surprisingly, the positions of the imidazolinone rings, as well as the three-dimensional arrangements of the side chains in the chromophore environment, are practically identical in mKate_pH2.0 and mKate_pH7.0. In both the trans and cis states the phenolic ring almost  does not disturb the chromophore environment. The chromophore isomerization mostly affects the conformational state of Arg 197 . Two subunits from each tetramer in mKate_pH7.0 crystal structure with the cis chromophore present two alternative orientations of the Arg 197 side chain. The first orientation is generally similar to the one found in mKate_pH2.0 with the trans chromophore and mediates H-bonding between the side chains of Ser 158 and Ser 143 . Those residues were found to be important in stabilization of respective trans and cis conformational states of the phenolic ring. The second orientation of the Arg 197 side chain is completely different and mediates the connection between the Glu 145 and Glu 215 side chains by two salt bridges. The observed alternative orientations of the Arg 197 side chain presumably correspond to its different protonation states. In the mKate_pH2.0 the side chain of Ser 158 also adopts two alternative orientations. The first orientation is identical to that in mKate_pH7.0 and the second one provides H-bonding with the hydroxyl of the chromophore phenolic ring in the trans isomeric form.
As expected, the hydroxyphenyl moiety in the trans and cis isomeric forms exhibits different interactions within the interior of the ␤-barrel. The trans-cis isomerization results in the replacement of the H-bonding of the chromophore tyrosine hydroxyl with the Ser 158 side chains and via water with the Glu 145 and Thr 176 side chains by H-bonding with Ser 143 side chain and via water with the backbone of Glu 141 and Leu 199 (Fig. 4).
In the bright fluorescent state of mKate the phenolic ring of the cis chromophore occupies a local pocket bordered by the side chains of six residues: Thr 60 , Ser 143 , Met 160 , Arg 197 , Leu 199 , and Glu 215 (Fig. 5B). The positions occupied by Ser 143 and Arg 197 in mKate are filled in wild-type progenitor eqFP578 by Asn and His, respectively (36), whereas the other residues are invariant. Ser 143 appears to be very important for stabilizing the cis conformation of the chromophore.
In the dark state, the phenolic ring of the trans chromophore moves to an adjacent pocket composed of the side chains of nine residues: Thr 60 , Lys 67 , Arg 92 , Glu 145 , Ser 158 , Met 160 , Leu 174 , Tyr 178 , and Arg 197 (Fig. 5A). Two of these residues, Leu 174 and Arg 197 , are Phe and His, respectively, in wt eqFP578. Besides stabilizing H-bonds with Ser 158 and the nearest water molecule, the trans phenolic ring makes stacking interactions with pH-susceptible guanidinium group of Arg 197 . The position of Arg 197 is fixed by H-bonds with Ser 143 and with another pH-susceptible residue, Glu 145 . Ser 158 is critically responsible for the observed pH dependence of the fluorescence.
An NMR study of the relationship between cis-trans isomerization and the protonation state of a synthetic chromophore model in solution demonstrated that the free energy gain of the cis form over trans form is on the order of a single hydrogen bond (41). In the protein, the immediate chromophore environment provides an additional contribution that influences the cis-trans equilibrium and selectively stabilizes one form or the other. The change of the protonation state of mKate upon pH variation influences the electrostatic field in the ␤-barrel interior, apparently affecting the hydrogen bond system by gain or loss of H-bonds, which might trigger chromophore isomerization.
As mentioned above, the far-red fluorescent proteins, monomeric mKate and dimeric Katushka, have similar spectral characteristics (13,36). Our structural results suggest that three key residues, Ser 143 , Leu 174 , and Arg 197 (shown in red in the sequence alignment, Fig. 1) residing in the vicinity of the chromophore in mKate (Figs. 4 and 5) and apparently in Katushka and differing from those their progenitor eqFP578, are primarily responsible for the far-red shift. The other six positions in mKate and Katushka that differ from those in eqFP578 (highlighted in blue in Fig. 1) reside in the cap area of the ␤-barrel. The corresponding mutations are presumably silent or covariant with those directly responsible the far-red spectral shift.
Improved Variant mKate_S158A-Our structural results suggested that replacement of Ser 158 by a hydrophobic residue of an appropriate size would cause partial destabilization of the dark trans state of the chromophore, shifting the equilibrium toward the bright cis state and thus increasing the fluorescence  OCTOBER 24, 2008 • VOLUME 283 • NUMBER 43 power at low pH. The mKate_S158A variant was prepared, and we compared its spectral and biochemical characteristics with those of mKate ( Fig. 2 and Table 3). mKate_S158A is a bright fluorescent protein with ex , 588 nm; em , 633 nm; i.e. emission spectra is slightly blue-shifted, compared with mKate (Fig. 2b). As expected, this variant is characterized by substantially higher pH stability (Fig. 2a), with pK a ϭ 5.3, compared with 6.2 of mKate. The mKate_S158A is characterized by an essentially higher molar extinction and also by a higher fluorescence quantum yield, resulting in almost 2-fold brighter fluorescence, as well as by a higher rate of chromophore maturation. In contrast to mKate, at physiological pH 7.0 -7.5 it demonstrates no kindling effect upon irradiation by excitation light, in agreement with the trans-cis isomerization hypothesis of kindling effect (42,43). At this pH, characterized by relatively low level of cumulative protonation, the equilibrium state of the mKate chromophore has approximately ϳ10% fraction of the transform. Photoactivation of mKate by light irradiation causes switching of the remaining dark trans-form to the fluorescent cis-form, resulting in gradual increase of the intensity at fluorescence maximum by an additional ϳ10%. Reverse transformation was found to be less favorable. At physiological pH the chromophore of mKate_158A is, presumably, in ϳ100% cisform, thus irradiation does not cause the kindling effect. In addition, the photostability of mKate_S158A is comparable to that of the highly photostable mKate, making it an excellent fluorescent tag for labeling of fusion proteins in the far-red part of the visible spectrum. a Brightness is calculated as a product of the molar extinction coefficient and the fluorescence quantum yield and is given by comparison to the brightness of EGFP.