The origin and evolution of human glutaminases and their atypical C-terminal ankyrin repeats

On the basis of tissue-specific enzyme activity and inhibition by catalytic products, Hans Krebs first demonstrated the existence of multiple glutaminases in mammals. Currently, two human genes are known to encode at least four glutaminase isoforms. However, the phylogeny of these medically relevant enzymes remains unclear, prompting us to investigate their origin and evolution. Using prokaryotic and eukaryotic glutaminase sequences, we built a phylogenetic tree whose topology suggested that the multidomain architecture was inherited from bacterial ancestors, probably simultaneously with the hosting of the proto-mitochondrion endosymbiont. We propose an evolutionary model wherein the appearance of the most active enzyme isoform, glutaminase C (GAC), which is expressed in many cancers, was a late retrotransposition event that occurred in fishes from the Chondrichthyes class. The ankyrin (ANK) repeats in the glutaminases were acquired early in their evolution. To obtain information on ANK folding, we solved two high-resolution structures of the ANK repeat-containing C termini of both kidney-type glutaminase (KGA) and GLS2 isoforms (glutaminase B and liver-type glutaminase). We found that the glutaminase ANK repeats form unique intramolecular contacts through two highly conserved motifs; curiously, this arrangement occludes a region usually involved in ANK-mediated protein-protein interactions. We also solved the crystal structure of full-length KGA and present a small-angle X-ray scattering model for full-length GLS2. These structures explain these proteins' compromised ability to assemble into catalytically active supra-tetrameric filaments, as previously shown for GAC. Collectively, these results provide information about glutaminases that may aid in the design of isoform-specific glutaminase inhibitors.

Glutamine, the most abundant amino acid in human plasma and muscles (1), is consumed by rapidly proliferating tumor cells to meet their increased energy and biosynthetic precursor demands (2,3). The enzyme glutaminase converts glutamine into glutamate, which is further catabolized to produce ATP, nucleotides, amino acids, lipids, and glutathione. In this regard, glutaminase is a well established target for the inhibition of cell transformation (4). Accordingly, several oncogenes and tumor suppressors have already been described to be involved in glutaminase-dependent glutamine breakdown (4). Notably, c-Myc stimulates glutamine uptake and degradation by GLS (specifically the product of the GLS gene) to sustain cellular viability and Krebs cycle anaplerosis (5). Similarly, nuclear factor-B (NF-B) enhances glutamine metabolism by decreasing the expression of miR-23a, which targets the GLS mRNA (6); Rho GTPases, ubiquitin ligase anaphasepromoting complex/cyclosome (APC/C)-Cdh1, and c-Jun also regulate glutamine metabolism by acting on GLS and consequently supporting cancer growth (7)(8)(9). However, a second gene encodes another glutaminase enzyme, named GLS2. In contrast to GLS, GLS2 is regulated by the tumor suppressor p53 and has tumor suppressor features toward gliomas, glioblastomas, lung cancer, and hepatocarcinomas (10 -14).
The existence of multiple glutaminases in mammals was first reported by Hans Krebs (15), on the basis of the detection of tissue-specific kinetic parameters and their susceptibility to inhibition by their catalytic products, when he probed the conversion of glutamine into glutamate in tissue extracts. In addition to being expressed by two different genes, at least four glutaminase isoforms have been described in mammals. The kidney-type isoforms, kidney-type glutaminase (KGA) 5 and glutaminase C (GAC), are generated by alternative splicing of the GLS gene (2q32.2). Both isoforms are activated by inorganic phosphate and inhibited by glutamate (16 -18). In contrast, the liver-type isozymes, liver-type glutaminase (LGA) and glutaminase B (GAB), originate from the GLS2 gene (12q13. 3) through the use of alternative transcription initiation sites (19). The recombinant construct that spans a common region between the GLS2 isoforms responds poorly to phosphate (20) and is not inhibited by glutamate (19,21).
Although both the kidney isozymes and GAB are expected to be localized in the mitochondrial matrix because of the transit peptide sequences at their N termini, KGA and LGA can also localize to the cytosol and nuclei (20,22). However, GAC has been shown to be exclusively located in the mitochondrion (20). Moreover, GAC provides key growth advantages to cancer cells (20). KGA, LGA, and GAB share a glutaminase domain that is well conserved in sequence and structurally similar to bacterial glutaminases, and it is flanked by a long N-terminal domain folded in an EF-hand-like four-helix bundle and a C-terminal domain with three putative ANK repeats (20). In contrast, GAC has a shorter, 48-amino acid-long C terminus with no canonical motif or domain. GAC is also unique because it assembles into highly active, long double-stranded helical filaments in the presence of inorganic phosphate, whereas KGA forms shorter and less active structures under the same conditions (23).
Intrigued by the diversity in numbers, architecture, mechanism of activation, and enzymatic capabilities of the human glutaminases, we felt motivated to investigate the origin, evolution, and possible functions of such complex features. First, by identifying homologous prokaryotic and eukaryotic sequences and building a phylogenetic tree, we propose that the multidomain architecture of the mammalian glutaminases GLS and GLS2 is a feature inherited from bacterial ancestors, probably simultaneously with the hosting of the proto-mitochondrion endosymbiont. From this phylogenetic tree, we also propose an evolutionary model wherein a GLS-like predecessor gene in tunicates gave rise to the currently known human isozymes through exon remodeling, gene duplication, and retrotransposition events. In addition, we solved high-resolution crystallographic structures of the C-terminal ANK repeats from both human KGA and GAB/LGA. These structures display a unique type of dimerization for ANK domains, governed by two short amino acid motifs that are highly conserved from bacteria to higher eukaryotes. Finally, we present the first C terminus-containing crystallographic structure of human KGA, which explains its inability to assemble into the catalytically active supra-tetrameric filaments, as previously shown for GAC. A similar model for human GLS2 is also proposed on the basis of small-angle X-ray scattering data and cryo-electron microscopy. Collectively, these results contribute to the general knowledge on the many human glutaminase isoforms as well as explain the mechanistic diversity that poses challenges that still must be overcome to increase the chances of successfully inhibiting the enzyme in a clinical setting.

The multidomain origin of glutaminases
An evolutionary analysis of 2,796 bacterial glutaminase protein sequences and 789 eukaryotic sequences selected from a BLAST search was performed using the glutaminase domain from human KGA (Ile 221 -Arg 544 ) as the query sequence (E value Յ 0.0001). Particularly, no glutaminase homologs were identified in archaea. The bacterial sequence sizes presented three normal distributions. The first and major distribution (ϳ85% of the sequences) was centered at 313 Ϯ 10 amino acids (Fig. 1A), thus suggesting that the proteins predominantly contained only the catalytic domain (24). The second distribution was smaller (10% of the sequences) and centered at 429 Ϯ 16 amino acids. Detailed analysis suggested that this group possesses a sulfate transporter and anti-factor antagonist (STAS) domain located C-terminal to the glutaminase domain (Fig.  1A). Finally, 4% (109) of bacterial glutaminases clustered in a third distribution centered at 613 amino acids. These sequences, in addition to the STAS domain, have an extra cyclic nucleotide-binding domain located N-terminal to the glutaminase domain. Notably, three cyanobacteria sequences from this group possess a glutaminase domain flanked by an Eps15 homology-like domain (EH-like) at the N terminus and two clusters of ANK repeats at the C terminus (Fig. 1A). The Eps15 homology domain is a pair of EF-hand motifs that recognize and bind to proteins containing Asn-Pro-Phe (NPF) sequences (25). Aside from these three-domain cyanobacteria sequences, 17 proteo-and actinobacteria sequences situated between the second and third normal distributions (Fig. 1A, blue oval) also exhibit a glutaminase domain flanked N-terminal by an EH-like domain and C-terminal by only one cluster of ANK repeats.
Eukaryotic glutaminases, which primarily belong to the kingdoms Animalia and Fungi, are consistently longer and present a single Gaussian size distribution averaging 631 amino acids in length (mode at 615), with a skewed standard deviation of 45 amino acids toward shorter sequences and 26 amino acids toward longer sequences (Fig. 1B). The left-most end of the distribution contained the fungal glutaminases, which are 450 to 500 amino acids long and present a unique architecture, wherein the glutaminase domain is followed by a long C terminus (156 Ϯ 7 amino acids) with an unknown fold (Fig. 1B, pink  bar). The remaining eukaryotic glutaminases that are longer than 500 amino acids in length mostly belong to the Animalia kingdom and have extended N-terminal sequences of varying lengths (Fig. 1B). The N-terminal region immediately adjacent to the glutaminase domain is more than 40% similar to the EF-hand-like four-helix bundle, which is structurally similar to the EH-like domain found in bacteria. Indeed, the X-ray structures of human GLS containing this domain have an EH-like structure (20).
Putative ANK repeats are predicted for about 550 of the longer eukaryotic glutaminases and are located C-terminal to the glutaminase domain with no exception. Another 76 sequences,

The multidomain architecture of human glutaminases
all of mammalian origin, have a shorter C terminus that closely resembles human GAC in sequence (Fig. 1B). Finally, a few glutaminase sequences (11 in total) belonging to the major haptophyte and heterokont lines of eukaryotes were identified. These sequences are much longer, ranging from 800 to 1250 amino acids in length, and the predicted architectures of some of them consist of two complete tandemly associated ANKcontaining glutaminases (data not shown).
To detect evolutionary relationships based on the architectural organization, we next reconstructed an unrooted cladogram using the analyzed sequences (Fig. 1C). The derived tree structure has a taxa organization similar to that found in rRNAbased phylogenetic trees (26). Many unresolved nodes (polytomy) were observed, particularly for the bacterial glutaminases. We believe that such observation stems from the fact that many bacterial species possibly have more than one glutaminase, often as a result of the combination of a single protein with a multidomain isozyme. Except for fungal glutaminases, which clustered closer to the bacterial homologs, the eukaryotic proteins had a better-resolved tree. Protists, nematodes, and arthropods form a polytomic group with chordates, as well as with ANK-containing proteo-and cyanobacteria; this grouping suggests that the appearance of this multidomain feature was an early event, concomitant with the symbiotic association related to the appearance of eukaryotes, because archaea does not carry any glutaminase-like coding gene. A subsequent lineage split linked to the appearance of multiple variations around the same basic architecture (EH-like ϩ glutaminase ϩ ANK) suggests that the gene duplication event that produced the GLS and GLS2 genes and isoform differentiation (with the appearance of GAC-like isoforms; Fig. 1B, dark blue oval) appeared with chordates. However, interestingly, birds have completely lost the GLS2 gene.
Finally, based on a theoretical reconstruction of the ancestral glutaminases from each of the main eukaryotic branches (chordata, arthropods, nematodes, and fungi), the most primitive glutaminase acquired from a bacterial ancestor was most probably the kidney-type glutaminase (KGA-like), as shown in Fig.  1D. The general organization EH-like ϩ glutaminase ϩ ANK is preserved in all four reconstructed ancestral sequences. Comparison of the sequence identity of human genes with the reconstructed common ancestor of GLS and GLS2 from Gnathostomata suggested that GLS was restrained during evolution to maintain the same function, whereas GLS2 exhibited more changes, probably acquiring new functions and/or regulatory mechanisms. The sequences obtained were analyzed by number of amino acids (aa); the normal distribution of the sequence length for bacterial and eukaryote glutaminases are, respectively, represented in A and B, with the respective architectures indicated in rectangles; in parentheses, the number of sequences containing the corresponding architecture/the total number of sequences contained in the referred region of the distribution plot. C, cladogram based on architectural organization obtained from a maximum likelihood phylogenetic reconstruction approach. D, the generated phylogenetic tree and respective alignment were used to reconstruct ancestral protein sequences for specific nodes (ancestral chordates (1); nematodes (2); arthropodes (3); and fungi (4)) using maximum parsimony. The obtained ancestral sequences were aligned with the human GLS (KGA and GAC) and GLS2 (GAB and LGA) and the sequence similarity displayed as a heatmap of pairwise distances constructed in SDT using MUSCLE alignment. The obtained result suggests that GLS is the most primitive gene, which has been further duplicated to generate GLS2. cNMP, cyclic nucleotide-binding domain; SRPBCC, START/RHO␣C/PITP/Bet v1/CoxG/CalC.

The origin of glutaminase isoforms
Having established a plausible origin for the multidomain architecture of the glutaminases, we next asked how the different variants of the enzyme, currently observed in humans, might have evolved. We first performed a "top-down" search for glutaminase genes in phylum Chordata using the complete genomic sequence of the human GLS and GLS2 genes as templates. Second, we identified regions homologous to exon 14 of GLS (after which the GLS splicing event occurs) and exon 15 (the "GAC exon").
In the lower chordates, we verified that the genome of the tunicate sea squirt (27) (Ciona intestinalis) has a single GLS-like gene containing 12 exons. A region similar to human exon 14 (70% identity, red bar in the tunicate branch, Fig. 2A) is inserted in exon 11. The Cephalochordate amphioxus (28) (Branchiostoma floridae) also has a single glutaminase gene, which, interestingly, has only two exons. In this gene, the region homologous to the human exon 14 is appropriately located at the 3Ј boundary of exon 1; however, no homology to the GAC exon was identified in the downstream sequence. A gene duplication event was observed further upstream in lampreys (29) (Pekinsus marinus, Hyperortia), which generated distinct sequences related to the human GLS and GLS2 genes.
Next, the Chondrichthyes Callorhinchus milii (elephant shark (30)) also has two glutaminases that are phylogenetically related to the GLS and GLS2 genes without annotated splicing regulation. However, in the GLS gene, a region homologous to the human GAC exon is for the first time observed and located downstream of an exon 14-equivalent sequence, thereby suggesting an origin for the splicing variant GAC. Zebrafish (Danio rerio (31)) and other fish species possess 5 glutaminase genes, thus suggesting subsequent autapomorphic duplication events for both GLS and GLS2. Interestingly, the GAC isoform was maintained in both copies of the GLS gene, thus supporting the hypothesis that the GAC exon appeared before this duplication; a new GLS-like (GLSL) sequence was identified in species within this branch, such as zebrafish, elephant shark, and others.
Of note, the amphibian Xenopus tropicalis (Western clawed frog (32)), with only one copy of both GLS and GLS2, points to a compaction of the GLS2 introns and exons against the spread of the GLS across a larger genomic region. Finally, the human glutaminase gene structures retained the same architecture observed in the amphibian genes, although the sequence lengths were smaller.
The identification of several transposable elements such as Alu and L2 (Fig. 2B), within the human GLS intron 15, strongly suggests that the insertion of the GAC exon was due to an early retrotransposition event; however, no consensus matching elements were found in the rapid evolving intron 14. Nevertheless, a thorough tblastn analysis failed to identify a region homologous to the GAC exon (exon 15) within the human genome, preventing us from proposing an original location for this sequence.

Crystal structures of the human glutaminase ankyrin repeats
We next determined the crystal structures of the ANK-containing C-terminal regions of both human KGA (KGA.ANK: Val 551 -Leu 669 ) and GLS2 (GLS2.ANK: Lys 485 -Val 602 ). Both proteins were subjected to limited proteolysis with trypsin (for KGA.ANK) or chymotrypsin (for GLS2.ANK) during purification to remove the likely intrinsically flexible regions and obtain well diffracting crystals. X-ray diffraction datasets were obtained for KGA.ANK crystals belonging to two different space groups, namely the tetragonal P4 3 2 1 2 and a monoclinic P12 1 1. Sulfur single-wavelength anomalous dispersion (SAD) phasing was used to produce a higher resolution molecular model (at 1.41 Å maximum) from the tetragonal crystals. The obtained structure was then used as a search model for molecular replacement of the monoclinic crystals, which diffracted up to 1.74-Å maximum resolution. Finally, the structure, which was collected at 2.55-Å maximum resolution, of the GLS2.ANK hexagonal crystals (space group P6 5 ) was solved by molecular replacement using the KGA.ANK structure. The statistics of the data collection, processing, phasing, and model refinement data for all the structures are shown in Table 1. Because of the high structural similarity observed for both KGA.ANK structures (backbone r.m.s. deviations of 0.4 Å), only the higher resolution model was used for the subsequent detailed analysis and comparison to the GLS2.ANK structure.
Overall, the C-terminal domains of KGA and GLS2 each contain three ANK repeats (labeled ANK1 to ANK3) and are very similar, both in sequence composition (80% identical) and overall structure (r.m.s. deviation of 0.5 Å over 89 equivalent C␣ positions; Fig. 3A). The largest structural variation (a shift between 1.3 and 1.8 Å of the C␣ atom positions) occurs in the ␤-hairpin that links ANK1 to ANK2 (␤-hairpin 2), whereas the minimal deviations (ϳ0.1 Å) are observed in the inner and outer ␣-helices of ANK2. The primary sequences of ANK1 and ANK3 repeats in both KGA and GLS2 diverge from the canonical ANK repeats at the TPLH tetrapeptide motif, which is located at the beginning of the inner ␣-helices and is responsible for the stabilization of ANK repeats through reciprocal hydrogen bonding between the threonine hydroxyl and histidine imidazole groups (33,34) (Fig. 3B). The ANK1 inner helix of both KGA and GLS2 is a half-turn longer and lacks the ␤1 turn compared with the typical inner helix observed in ANK repeats (Fig. 3B), whereas in the ANK3 inner helix, the histidine is replaced with an aspartate, thus preventing the formation of the conserved hydroxyl-imidazole hydrogen bond. Moreover, in both structures, ANK2 contains an extra residue, a surfaceexposed lysine (Lys 611 at KGA and Lys 543 at GLS2, indicated by a plus sign in Fig. 3B), at the C terminus of the outer helix. Finally, the GLS2 ANK3 outer helix is one turn longer than the canonical ANK repeats (Fig. 3B).

Glutaminase ankyrin repeats assemble into atypical dimers
A search for the biologically relevant assembly of the KGA and GLS2 ANK domains was performed using the PISA (protein interfaces, surfaces and assemblies) server (35). The program identified a thermodynamically stable interface involving a crystallographic homodimer, with a solvation free energy gain of Ϫ5.4 kcal/M upon formation (Fig. 3C). The identified interface occludes an area of 800 Å 2 and corresponds to a symmetric groove-to-groove interaction in both structures. Twelve hydrogen bonds and 12 salt bridges are formed between the short loops present in the ␤-hairpins 1 (linking ANK1 to ANK2) and   3C). Despite the overall relatively low sequence identity of the eukaryotic and bacterial glutaminase ANK repeats, both the DYD and DRW motifs are highly conserved (Fig. 3, D and E, respectively). Finally, additional polar interactions are observed between the ␤-hairpin 1 loop and the ANK3 inner helix and between the ␤-hairpin 2 loop and the ANK2 inner helix, complementing the identified symmetrical interface (data not shown).

Crystal structure of full-length human KGA
In addition to the determination of the crystal structures of the isolated C-terminal ANK dimers, we also solved the novel crystal structure of KGA containing the C terminus, which was bound to the inhibitor bis-2[5-phenylacetamido-1,2,4-thiadiazol-2-yl]ethyl sulfide (BPTES) (36). The diffraction dataset was collected at a 3.6-Å maximum resolution (space group P4 2 2 1 2) with lengthy cell parameters and a high solvent content of ϳ78% (Table 1). Phasing was achieved by molecular replacement using the coordinates of the previously solved N-terminal and glutaminase domains from GAC bound to BPTES (PDB code 4jkt) and the KGA.ANK dimer described here as search models. The P4 2 2 1 2 asymmetric unit contains five KGA monomers; accordingly, one canonical tetramer (a dimer of dimers) is formed within the asymmetric unit by four monomers and, the fifth monomer is related to another tetramer by the crystal 2-fold screw axis. The final model obtained (Fig. 4A) was refined to R factor and R free of 27.3 and 31.8%, respectively.
The ANK dimers in the KGA structure (green surface in Fig.  4B) are spatially located between the N-terminal EF-hand-like domains (blue surfaces in Fig. 4B). Notably, both ANK dimers of a tetramer lie on the same side of a sectional plane defined by the two longest axes of the protein (Fig. 4B), breaking the previously established 2-fold dihedral symmetry of the glutaminase structures lacking the C terminus (20).
The low-resolution model did not allow us to provide a detailed description of the side chain interactions between the ANK dimers and the N-terminal domain. However, the close contact between the ANK1 outer helix of chain A with the EFhand-like domain helix H1 of chain B is unmistakable (Fig. 4C). The interface contains large, exposed polar residues, such as glutamic acid and arginine, which are likely to be responsible for the contact. An equivalent arrangement was also observed for the ANK1 outer helix of chain C with the EF-hand-like domain helix H1 of chain D (data not shown). The majority of the C-terminal region of KGA adjacent to the ANK repeats (residues Thr 647 -Leu 669 ) is likely heterogeneous in conforma-

The multidomain architecture of human glutaminases
tion and therefore could not be modeled. According to secondary structure predictions by the JPred4 server (37), this region is mostly unstructured. This information, added to the fact that such region is present in a KEN box involved on KGA degradation (8), suggest that this segment is likely functionally independent from the ankyrin domain.
As shown in our previous study (23), the phosphate-dependent enzymatic activity of the GLS isoforms is directly related to their ability of self-assembling into supra-tetrameric helical filaments; GAC is the most active isoform and forms longer filaments. Moreover, we previously demonstrated that this assembly was generated by end-to-end associations via the N-terminal domains; the KGA assembly was always shorter, although it formed the same structure. Based on the KGA crystal structure presented here, we hypothesized that the ANK repeats located between the N-terminal domains prevent KGA from forming longer higher active filaments due to the fact that it destabilizes the polymerization. More specifically, the ANK dimers limit the association of the KGA tetramers into long polymers (23) because they limit the formation and availability of the filamentation interface (Fig. 4D). This interface is the groove region by which the single strand filaments of the GAC  39. Highly conserved residues are capitalized and in red, semiconserved residues are colored cyan and not capitalized. Residues involved in the dimer interface are indicated by a bar. ANK2 in both KGA and GLS2 contains an extra surface-exposed lysine, which is indicated by the ϩ signal. On the right, the superposition of ANK1, ANK2, and ANK3 is represented. C, dimer interface and associated interaction of ANK repeats. The side chains of the motifs DYD (left) and DRW (right) are represented in sticks. D, representation of sequence conservation of glutaminases ANK from eukaryotes and E, bacteria. The size of the letters is proportional to the degree of conservation of residues. The motifs DYD and DRW are highlighted. The residues in red shows the 100% conservation throughout the alignment. Residues are numbered according to human KGA.
isoform grows indefinitely, via end to end interaction between pairs of N-terminal domains (23).
To further confirm this hypothesis, we generated a C terminally truncated GLS construct containing the common sequence between KGA and GAC (N terminus and glutaminase domains, called ⌬C), but lacking the ANK repeats. Accordingly, the construct had a larger Stokes radius in comparison to KGA and GAC and was more heterogeneous in size (Fig. 4E). Moreover, as expected, the ⌬C construct was as active as GAC, possibly due to its increased ability to form extended filaments (Fig. 4F).

Low resolution structure of GLS2
Last, the availability of the crystal structure of individual domains of GAB/LGA, such as the C-terminal ANK dimers reported here and the glutaminase domain tetramer (PDB code 4bqm), allowed for a proposition of the full-length structure based on small-angle X-ray scattering data (Fig. 5, A and B) for the multidomain portion common to both isoforms. The C-terminal ANK domains are expected to be spatially located between the N-terminal domains, in an organization similar to that observed for KGA. Consequently, this protein is also expected not to assemble in filament-like higher order oligomers, being stable at the tetramer form. This has been previously observed by our group (23) and is here corroborated by measurements on cryo-electron microscopy images (Fig. 5C).

Discussion
The human genome has two copies of glutaminase-coding genes, which are known to produce at least four isoforms through alternative splicing or transcription at alternative initiation sites (4). Genetic evolutionary models often predict an increase in the gene copy number prior to the specialization or gain of new functions, with a momentary intermediate state of specialization with an overlap of functions (38). In addition, this apparent redundancy in glutaminase isoforms can be explained by differential tissue expression patterns, as well as the cell proliferation state. In this regard, KGA is expressed in kidney tubule epithelial cells, in which ammonia production is key to renal acid-base regulation (16); LGA is expressed in periportal hepatocytes, where it participates in urea synthesis (17). In addition, although GAC meets the glutamine-dependent metabolic demands of several types of cancer (20), GAB has tumorsuppressing activity in hepatocarcinomas and gliomas (40).
Despite this functional diversity, the human glutaminase isoforms converge into a conserved multidomain structure. Based on a combination of computational predictions and experimental evidence mainly produced by X-ray crystallography studies, we found that all four isoforms contain an N-terminal EF-hand-like domain, followed by a glutaminase domain in the middle and three ANK repeats at the C terminus. The only exception is GAC, in which the C terminus is a short unstructured region, because of alternative splicing of the GLS gene.
In this work, the multidomain architecture was shown to be shared by all chordates, including fishes, reptiles, birds, and amphibians, as well as by arthropods and nematodes. By surveying the database-deposited glutaminase protein sequences, we observed similar organizational characteristics in free-living proteo-and cyanobacteria, but not fungi. From our data, we propose that the multidomain glutaminase structure has a very ancient origin. In addition, we verified that multiple GLS genes and isoforms are present in a wide range of vertebrates. By further analyzing the genomes of representative species of the phylum Chordata, we were able to identify early events of exon repositioning among tunicates and cephalochordates, followed by gene duplication in Hyperoartia and exon retrotransposition in Chondrichthyes, concomitant with the change from a simple kidney structure pronephros to a more complex mesonephros with tubules (41). Together, all these features are likely the basis of the isozyme diversity observed in vertebrates. We also identified a glutaminase ancestor that was more similar to the human kidney-type glutaminase and gave rise to the liver-type glutaminase, whereas the appearance of GAC-like glutaminases, with shorter C termini, was a late event in glutaminase evolution. GAC is the most active isoform compared with KGA and LGA, a feature linked to its capacity to assemble into long filament-like superstructures (23). In the present study, we found that this feature is possible only because GAC lacks the original C-terminal bulky ANK repeats of parental KGA, because the ⌬C-terminal construct assembles in longer fila-ments and is as active as GAC. Therefore, we propose that filament formation is a gain-of-function characteristic of vertebrate glutaminases, which has been positively selected to create a more active enzyme due to the substrate channeling effect.
Our group and others have previously published the structure of the N-terminal EF-hand-like and glutaminase domains of GLS (20,42), but the structure of the C-terminal ANK repeats remained unsolved. Here, we provide novel crystal structures for the ANK domains of KGA and GLS2, which surprisingly form an atypical dimer. ANK repeats usually mediate the interaction of a protein with a different partner. However, KGA.ANK and GLS2.ANK mediated the formation of homooligomers, a rare feature. A survey of structures in the Protein Data Bank revealed 68 unique crystal structures of ANK-containing proteins, containing between 2 to 24 ANK repeats (supplemental Table S1). Of these crystal structures, only 5 displayed an ANK-to-ANK association (supplemental Fig. S1). However, none contained an interface that resembles the glutaminase ANK dimer (contact between the DYD and DRW motifs). Therefore, we propose that the ANK dimer association described here is structurally unique to glutaminases and was selected as a conserved feature throughout glutaminase evolution. The explanation for this positive selection remains elusive.
The short regions located immediately after the ANK repeats in KGA and GLS2 (KEN and ESMV motif, respectively) are involved in E3 ubiquitin ligase-dependent degradation (8) and direct interactions with PDZ domain-containing proteins (43), respectively. An analysis of protein complexes including ANKcontaining proteins and their interaction partners (44 -47) showed that most of the contacts necessary for ANK-to-ANK interaction involves the ANK grove (the concave surface). Although most of the known cases involve the concave surface, some ANK repeats mediate protein-protein contacts through the convex face. One example is the human oncoprotein g-ankyrin, which was crystallized in complex with an antibody fragment. In this structure, the interaction occurred through the outer ␣-helices of the ANK4-ANK6 repeats (48). The vaccinia virus K1 protein consists entirely of ANK repeats that are involved in interactions mediated by the convex surface (49). In addition, the VPS9-domain ANK repeat protein binds to Rab32 through the convex side of its ANK repeat (50).
In this regard, although the concave faces of the ANK repeats of KGA and GLS2 make contacts within the glutaminase, we predict that the ANK repeats may still mediate protein-protein interactions through the convex surface. Recently, GAB has been shown to bind and inhibit the small GTPase Rac1 by preventing its interaction with a guanine-nucleotide exchange factor, an interaction that involves the C terminus of GLS2 (residues 464 to 602) (40). Because the region contains the ANK repeats of GAB, we predict that the contact involves the ANK concave face. Considering the long list of partners that were shown to interact with glutaminases using mass spectrometrybased approaches (51), further investigations are required to confirm the mechanism by which the ANK repeats in glutaminases mediate protein-protein interactions.
Last, although at low resolution, the novel ANK-containing structure of human KGA confirms our previous hypothesis that the long-range propagation of filaments for this isoform was thwarted by the presence of the ankyrin repeats themselves (23). As observed in the crystallographic model, the ANK dimer occludes the formation of the filamentation groove, by which the single strand filament of the GAC isoform grows indefinitely, via end to end interaction between pairs of N-terminal domains. Because GAC possesses a shorter unstructured C-terminal sequence, this hypothesis is confirmed when the ANK domain is completely deleted in a mutant construct, which can form longer polymeric species, and therefore, resulting in a more active protein.

Phylogenetic reconstruction
726 Glutaminase protein sequences obtained from Gen-Bank TM (52), by performing a BLAST search with human glutaminase domain sequence from GLS as the query, were aligned using kalign (53). Maximum likelihood phylogenetic reconstruction was performed with RaxML (54) with 260 bootstrap pseudoreplicates and using automated model search. Replicates were summarized with sumtrees from DendroPy (55). The generated phylogenetic tree and respective alignment were used to reconstruct ancestral states for specific nodes using maximum parsimony, as implemented in phangorn (56). Heatmaps from pairwise distances were constructed in SDT (57) using MUSCLE (58) alignment.

Evolution of glutaminase exon structure
Protein sequence corresponding to human GLS exon 14 was used as input of translated blast (tblastn) searches against avail-able genomes from C. intestinalis (27), B. floridae (28), P. marinus (29), C. milii (30), D. rerio (31), and X. tropicalis (32). After an initial Reciprocal Best Hit (59) region was found, the correct position of the exon was determined by pairwise LALIGN between sequences (60). Regions downstream from the genomic sequence homologous to exon 14 were also evaluated by LALIGN against human GAC-exclusive exon (exon 15). As both P. marinus glutaminases are incomplete, available exons were found in contig using the LALIGN approach. The GLS region comprising intron 14, exon 15, and intron 15 was used as input of TranspoGene (61) retrotransposon searching.

Protein expression and purification
The constructs of human KGA Val 124 -Leu 669 and Val 551 -Leu 669 were amplified from the pcDNA3.1/hKGA-V5 clone, kindly provided by Dr. Richard Cerione (Cornell University, Ithaca, NY), and subcloned into the pET28a plasmid (Novagen) using the NdeI and XhoI restriction sites with N-terminal His 6 tag. KGA Val 124 -Arg 550 was generated from the pET28a KGA Val 124 -Leu 669 construct by site-directed mutagenesis of the Val 551 residue into a stop codon (TAA) using the QuikChange II XL Site-directed Mutagenesis Kit (Agilent Technologies). The construct of human GAB Lys 485 -Val 602 was cloned into pNIC28-Bsa4 plasmid. The construct of GAC Met 128 -Ser 603 was amplified from a mouse fetal brain tissue cDNA library and cloned into the pET28a plasmid. The four constructs were transformed into Escherichia coli Rosetta-2 thermocompetent cells (Merck). Overnight cultures, grown in LB medium in the presence of 50 g ml Ϫ1 of kanamycin and 50 g ml Ϫ1 of cloramphenicol, were inoculated in a ratio of 1:100 in 3-liter cultures supplemented with the same antibiotics and left shaking at 200 rpm for 5 h at 37°C. The cultures were then downtempered to 18°C for 1 h before induction with 200 nM isopropyl ␤-D-1-thiogalactopyranoside for 16 h at 18°C. Cells were collected by rapid centrifugation and resuspended in 500 mM NaCl, 50 mM Tris HCl, pH 8.5 (or 50 mM HEPES, pH 7.5, for GAB construct), 10% glycerol, and 2 mM ␤-mercaptoethanol and 1 mM PMSF (phenylmethylsulfonyl fluoride). Cell lysis was performed chemically, by incubation with hen egg white lysozyme, DNase I, and deoxycholate (all three reagents from Sigma) for about 1 h, incubated on ice. The soluble fractions were separated from the debris by high speed centrifugation and subsequently loaded, by gravity and in a cold room, on immobilized metal affinity column, Co 2ϩ -charged TALON (Clontech) for the GLS constructs and nickel-nitrilotriacetic acid Superflow (Qiagen) for the GAB construct, previously equilibrated with the running buffer 10 mM NaCl and 50 mM Tris-HCl, pH 8.5, HEPES, pH 7.5. The constructs were eluted stepwise using running buffer to which up to 500 mM imidazole (v/v) had been added. For the GLS constructs, the tag was removed by overnight digestion with bovine thrombin (Sigma) and the samples loaded into a HiTrap Q HP anion exchange chromatography column (GE Healthcare). Elution was done by performing a linear gradient with a buffer containing 1 M NaCl, 50 mM Tris-HCl, pH 8.5, and 2 mM ␤-mercaptoethanol. The fractions containing the GLS constructs were loaded in a HiLoad 16/600 Superdex 200 (for KGA Val 124 -Leu 669 , KGA Val 124 -Arg 550 , and GAC Met 128 -Ser 603 constructs) or 75 pg (GE Healthcare) (for KGA Val 551 -Leu 669 ). The final buffer conditions were 150 mM NaCl, 30 mM HEPES, pH 8, and 0.5 mM tris(2-carboxyethyl)phosphine). For the GAB construct, the eluate from the affinity chromatography was directly loaded in a HiLoad 16/600 75 pg. Protein concentration was determined by UV 280 nm using calculated extinction coefficients. The hydrodynamic parameters (Stokes radii, Rh) of human KGA, GAC, and the deletion mutant ⌬C were determined by gel filtration chromatography using a prepacked Superdex 200 HR 10/30 column (GE Healthcare) in 25 mM Tris-HCl, pH 8.0, 150 mM NaCl, 0.5 mM tris(2-carboxyethyl)phosphine). For each purified protein, ϳ1 mg ml Ϫ1 concentration in a 500-l volume was injected into the column. To induce the formation of the higher-order oligomeric species, 20 mM K 2 HPO 4 as a final concentration was added in the protein solution before loading it onto the column. The gel filtration buffer was also supplemented with 20 mM K 2 HPO 4 . The flow rate was maintained at 0.5 ml min Ϫ1 . To establish the hydrodynamic radius (Rh) and elution volume relationship of the protein, proteins of known Rh were run on the same column. The proteins used as standards were ferritin (440 kDa, Rh ϭ 60.8 Å), aldolase (158 kDa, Rh ϭ 48.1 Å), conalbumin (75 kDa, Rh ϭ 36.4 Å), ovalbumin (44 kDa, Rh ϭ 30.5 Å), and ribonuclease A (13.7 kDa, Rh ϭ 15.9 Å).

Crystallization
Following size-exclusion purification, KGA Val 124 -Leu 669 was concentrated using an Amicon 30-kDa cutoff concentrator (Millipore) to a final concentration of ϳ7.5 mg ml Ϫ1 . Crystallization experiments were performed at 277 K using the conventional sitting drop vapor diffusion technique. Drops were made by mixing two parts of protein previously incubated with 1.2 mM BPTES to one part of the well solution, containing 1.8 M sodium formate, 0.5 M NaCl, and 0.1 M BIS-TRIS propane, pH 6.8. Before data collection at cryogenic temperature (100 K), harvested crystals were cryoprotected with 10% ethylene glycol added to the mother liquor.
For large-scale limited proteolysis, KGA Val 551 -Leu 669 after a size-exclusion purification step was incubated with 1:100 trypsin (Sigma) at 23°C for 20 min. The proteolysis was stopped by the addition of 1.5 mM PMSF. For limited proteolysis of GAB Lys 485 -Val 602 , purified protein was incubated overnight with 1:1000 ␣-chymotrypsin at 23°C. The digested fragments were immediately purified by size-exclusion chromatography. For crystallization trials, digested KGA Val 551 -Leu 669 and GAB Lys 485 -Val 602 were concentrated, respectively, to 25 and 50 mg ml Ϫ1 using Amicon 10-kDa cutoff concentrators.
Both constructs were crystallized by sitting drop vapor diffusion, by mixing equal parts of protein solution and mother liquor. KGA Val 551 -Leu 669 crystals were grown in (a) 3.

X-ray crystallography
Diffraction data were collected at beamlines I03 at the Diamond Light Source (UK) and 12-2 at the Stanford Synchrotron Radiation Lightsource, respectively, for KGA Val 124 -Leu 669 and for KGA/GAB ANK crystals. Datasets were integrated using Mosflm (62) (for native datasets) and XDS (63) (for sulfur-SAD dataset) and scaled with Aimless (64). The first set of phases of KGA Val 124 -Leu 669 was obtained by the molecular replacement technique as implemented in the program Phaser (65), using the coordinates of the mouse GAC isoform (PDB code 3SS3). KGA ANK was solved by sulfur-SAD using SHELX (66) and the model was refined using higher resolution native datasets. The model obtained for KGA was employed as a search model for solving the GAB ANK structure by molecular replacement (Table 1). Positional and B-factor refinement cycles, as well as solvent modeling, were performed with Refmac (67) and Phenix (68), followed by visual inspection using COOT (69).

Glutaminase activity assay
To obtain the kinetic parameters for KGA Val 124 -Leu 669 , KGA Val 124 -Arg 550 , and GAC Met 128 -Ser 603 , a mixture containing 10 nM glutaminase, 50 mM Tris acetate, pH 8.6, 3 units of bovine L-glutamate dehydrogenase (Sigma), 2 mM NAD (Sigma) was pipetted into 96-well plates previously filled with 6 or 12 serial dilutions of L-glutamine, to achieve a range of concentrations from 60 to 0.15 mM. K 2 HPO 4 (2 M stock, pH 9.4) was added to the mixture at a final concentration of 20 mM. The formation of NADH was tracked by absorbance readings at 340 nm, for up to five consecutive minutes, at room temperature. Measurements were done in triplicate. The initial velocities, in picomoles of NADH produced per second, were calculated using an extinction coefficient for NADH of 6,220 M Ϫ1 cm Ϫ1 at 340 nm and 0.5 cm of path length. The total volume per reaction was 200 l. Plate-reader used was an EnSpire (PerkinElmer Life Sciences). Measurements were done in triplicates and analyzed using GraphPad Prism 5.00 (GraphPad Software, San Diego, CA).

Small-angle X-ray scattering
Scattering data were collected at ϭ 1.488 Å, for sampledetector distances of 1.1 m covering the momentum transfer ranges 0.015 Ͻ s Ͻ 0.442 Å Ϫ1 (s ϭ 4 sin/, where 2 is the scattering angle). The data were normalized to the intensity of the incident beam and corrected for the detector response using an in-house program. Two frames of 250 s were collected and compared for radiation damage using the program PRI-MUS (70). The same program was used to average the frames and subtract the buffer. The different protein concentrations were evaluated for aggregation by following increases in the measured R g (radius of gyration) as calculated by auto R g . The R g was confirmed by using the indirect Fourier transform program GNOM (71), which was also used to calculate the distribution function P(r) and D max . The data were analyzed and processed, including ab initio construction and model averaging, using the programs contained in the ATSAS package (72).

Cryo-electron microscopy
For visualization of cryogrids, purified GAB samples were frozen onto a Gatan 626 sample holder, prepared with FEI Vitrobot Mark II (force of Ϫ5 for 2s). Images were acquired using a JEM 2100 (200 kV) electron microscope with a LaB6 filament Gatan 4k ϫ 4k slow scan CCD camera (US4000). Exposure time was 1 s/frame, with a dose of 20 e Å Ϫ2 s Ϫ1 . The micrographs were processed using IMAGIC (73) and EMAN 2.1 (74). Iterative stable alignment and clustering (ISAC) (75) was used to generate reference-free class-averages from both the IMAGIC stack (25,000 particles) and the EMAN2.1 stack (33,608 particles). Using IMAGIC and UCSF Chimera (76), 10,000 projection images in random orientations of the GAC crystallographic model (PDB code 3SS3) were generated, in both dimer and tetramer configurations. The bounding rectangle dimensions of 200 particles randomly extracted from the IMAGIC, EMAN2.1, ISAC/IMAGIC, and ISAC/EMAN2.1 datasets (50 from each) were classified according to longer (L) and shorter (S) dimensions. Images were low-pass filtered to reduce noise. Only the particles contained in the 50 "best" class averages generated by each program were considered (total: 200 particles). Measurements were performed in Digital Micrograph based on the integration profile across each perpendicular direction, as shown in Fig. 5C. Measurements were taken as the distance, in pixels, between the valleys confining the particle signal. For comparison, we also measured 200 randomly selected projections of the GAC atomic model in dimer and also in tetramer configuration.