The ACT Domain: A Small Molecule Binding Domain and Its Role as a Common Regulatory Element*

The ACT domain is a structural motif in proteins of 70–80 amino acids that is one of a growing number of different intracellular small molecule binding domains that function in the control of metabolism, solute transport, and signal transduction (1–5). The first structure of an ACT domain was determined in 1995 with the crystal structure of Escherichia coli D-3phosphoglycerate dehydrogenase (6), a tetrameric protein containing one ACT domain per subunit. This structure represents the archetypical ACT domain and is composed of four strands and two helices arranged in a fold as shown in Fig. 1. However, it was not recognized as a recurring motif until 1999whenAravind andKoonin (2) proposed this based on a PSI-Blast (position-specific iterating-Blast) sequence data base search using the small subunit (IlvN) of acetolactate synthase. This search identified a diverse group of proteins that were noted to be involved in someway in amino acid and purine metabolism and were regulated by specific amino acids. They named this proposed domain the ACT domain after the first letters of three of the proteins, aspartate kinase-chorismate mutase-tyrA (prephenate dehydrogenase). In 2001, the structure of the Lrp-like transcriptional regulator fromPyrococcus furiosiswas published (7). This was the first structure of a transcription factor that contained an ACT domain. A PSI-Blast analysis of its sequence by Ettema et al. (3) revealed an additional group of proteins, including both enzymes and transcription regulators, that they proposed contained a novel type of ACT domain, which they named the RAMdomain for regulator of amino acidmetabolism. The Lrplike transcriptional regulator contains the ACT domain fold ( ), but the sequence alignment resulting from the PSIBlast search appeared to reveal a somewhat different pattern of conservation of residue type (3) (Fig. 2). Mutagenesis data also suggested that the ligand binding sites of the Lrp-like protein were located differently than in the ACT domains described previously. Although the ACT domain from E. coli D-3-phosphoglycerate dehydrogenase and the Lrp-like transcription factor superimpose very well (1.8-Å root mean square deviation), the dimer interfaces of each are significantly different. The ACT domain dimers of phosphoglycerate dehydrogenase form a side-by-side structure producing an extended 8-stranded sheet (Fig. 3A). On the other hand, the sheets of the ACT domain dimers of the Lrp transcription factor assume a more face-to-face configuration (Fig. 3J). These observations formed the basis for the proposed division into ACT and RAM domains. As will be seen later in this review, recently determined structures demonstrate that the ACT domain shows an increasing diversity in tertiary and quaternary architecture as well as ligand binding interactions. A novel type of ACT domain-containing protein family whose members contain ACT domain repeats has also been identified by sequence analysis in Arabadopsis (8). These proteins were termed ACR proteins. There are at least 8 genes in the 5 chromosomes of Arabadopsis that belong to the “ACR” protein family. Proteins similar to the ACR family have also been identified in rice (Oryza sativa) (9). The majority of ACT domain-containing proteins appear to interact with amino acids and are involved in some aspect of regulation of amino acid metabolism (2–4) (Fig. 1). These include both metabolic enzymes and transcription regulators. In fact, the expression of some ACT-containing enzymes is under the control of ACT-containing transcription regulators. This has resulted in the ACT domain being referred to as “the regulatory domain in amino acid metabolism” in the SCOP (structural classification of proteins) data base. However, notable exceptions to this generality have been revealed in recent years. These include the NikR transcriptional regulator that binds nickel and functions in the regulation of intracellular nickel levels (10) and the YkoF protein (11) that binds thiamine and is thought to be involved in thiamine transport. It is noteworthy that the 80–90-amino acid long ribonucleoprotein motif of RNA binding proteins (12, 13) also possesses the same fold as the ACT domain. These domains bind RNA through interaction at the face of their -sheet structure rather than binding smallmolecules in loop regions like the ACT domains. It is not known how these very similar domains may be related evolutionarily, but their similarity is intriguing. The RNA binding domains have their own unique pattern of conserved consensus sequences and the PSI-Blast searches that identified the ACT and RAM domains did not appear to select any RNA binding domains.

The ACT domain is a structural motif in proteins of 70 -80 amino acids that is one of a growing number of different intracellular small molecule binding domains that function in the control of metabolism, solute transport, and signal transduction (1)(2)(3)(4)(5). The first structure of an ACT domain was determined in 1995 with the crystal structure of Escherichia coli D-3phosphoglycerate dehydrogenase (6), a tetrameric protein containing one ACT domain per subunit. This structure represents the archetypical ACT domain and is composed of four ␤ strands and two ␣ helices arranged in a ␤␣␤␤␣␤ fold as shown in Fig. 1. However, it was not recognized as a recurring motif until 1999 when Aravind and Koonin (2) proposed this based on a PSI-Blast (position-specific iterating-Blast) sequence data base search using the small subunit (IlvN) of acetolactate synthase. This search identified a diverse group of proteins that were noted to be involved in some way in amino acid and purine metabolism and were regulated by specific amino acids. They named this proposed domain the ACT domain after the first letters of three of the proteins, aspartate kinase-chorismate mutase-tyrA (prephenate dehydrogenase).
In 2001, the structure of the Lrp-like transcriptional regulator from Pyrococcus furiosis was published (7). This was the first structure of a transcription factor that contained an ACT domain. A PSI-Blast analysis of its sequence by Ettema et al. (3) revealed an additional group of proteins, including both enzymes and transcription regulators, that they proposed contained a novel type of ACT domain, which they named the RAM domain for regulator of amino acid metabolism. The Lrplike transcriptional regulator contains the ACT domain fold (␤␣␤␤␣␤), but the sequence alignment resulting from the PSI-Blast search appeared to reveal a somewhat different pattern of conservation of residue type (3) (Fig. 2). Mutagenesis data also suggested that the ligand binding sites of the Lrp-like protein were located differently than in the ACT domains described previously. Although the ACT domain from E. coli D-3-phosphoglycerate dehydrogenase and the Lrp-like transcription factor superimpose very well (1.8-Å root mean square deviation), the dimer interfaces of each are significantly different. The ACT domain dimers of phosphoglycerate dehydrogenase form a side-by-side structure producing an extended 8-stranded ␤ sheet (Fig. 3A). On the other hand, the ␤ sheets of the ACT domain dimers of the Lrp transcription factor assume a more face-to-face configuration (Fig. 3J). These observations formed the basis for the proposed division into ACT and RAM domains. As will be seen later in this review, recently determined structures demonstrate that the ACT domain shows an increasing diversity in tertiary and quaternary architecture as well as ligand binding interactions.
A novel type of ACT domain-containing protein family whose members contain ACT domain repeats has also been identified by sequence analysis in Arabadopsis (8). These proteins were termed ACR proteins. There are at least 8 genes in the 5 chromosomes of Arabadopsis that belong to the "ACR" protein family. Proteins similar to the ACR family have also been identified in rice (Oryza sativa) (9).
The majority of ACT domain-containing proteins appear to interact with amino acids and are involved in some aspect of regulation of amino acid metabolism (2-4) (Fig. 1). These include both metabolic enzymes and transcription regulators. In fact, the expression of some ACT-containing enzymes is under the control of ACT-containing transcription regulators. This has resulted in the ACT domain being referred to as "the regulatory domain in amino acid metabolism" in the SCOP (structural classification of proteins) data base. However, notable exceptions to this generality have been revealed in recent years. These include the NikR transcriptional regulator that binds nickel and functions in the regulation of intracellular nickel levels (10) and the YkoF protein (11) that binds thiamine and is thought to be involved in thiamine transport.
It is noteworthy that the 80 -90-amino acid long ribonucleoprotein motif of RNA binding proteins (12,13) also possesses the same ␤␣␤␤␣␤ fold as the ACT domain. These domains bind RNA through interaction at the face of their ␤-sheet structure rather than binding small molecules in loop regions like the ACT domains. It is not known how these very similar domains may be related evolutionarily, but their similarity is intriguing. The RNA binding domains have their own unique pattern of conserved consensus sequences and the PSI-Blast searches that identified the ACT and RAM domains did not appear to select any RNA binding domains.

ACT Domain Sequence Homology
A recent search of the "protein family data base" (Pfam) listed 3779 proteins as containing the ACT domain. These represent at least 57 different architectures (Pfam accession number PF01842), and ACT domain proteins are found in Bacteria, Archaea, and Eukaryota, including vertebrates, plants, fungi, and single cellular organisms. However, the ACT domain is not found in all organisms nor is it used consistently within species or in all proteins with like function (1)(2)(3)(4).
The sequence homology among the ACT domains of proteins is not immediately obvious and went undetected until more sensitive sequence comparison methods, such as PSI-Blast, were developed. Because of the large number of alignments now available, they can only be presented here in a limited fashion. The reader is referred to the original articles (2)(3)(4)(5) as well as the Pfam data base for additional information. Fig. 2 presents the sequence alignments of representative ACT domains including all of those whose structures have been determined. The structures provide the additional advantage of considering the secondary structure elements in the alignment. The amino acid residues are highlighted by residue type follow-ing closely the residue-grouping pattern originally presented by Aravind and Koonin (2) and later by Ettema et al. (3). When the pattern of secondary structure is also taken into consideration, it can be seen that sequences of the RAM domain group are not inconsistent with the original ACT domain alignment (2). To illustrate this, Fig. 2 presents four putative RAM domains highlighted first according to the ACT domain alignment of Aravind and Koonin (2) and then according to the RAM domain alignment of Ettema et al. (3). A common characteristic of the initial alignment of Aravind and Koonin (2) was the presence of a nearly invariant glycyl residue at the turn between the first ␤ strand and the first ␣ helix that coincided with the binding site for L-serine in phosphoglycerate dehydrogenase. This led them to propose a common ligand binding mode for all ACT domains.
The RAM domain proteins are missing the glycyl residue in this location. This coincides with the presence of another highly conserved glycyl residue in the loop between the ␤2 and ␤3 strands. Although the RAM domain proteins share many features of the ACT domain alignment, the ␤2-␤3 loop area does appear to be rather distinctive for the RAM grouping ( Fig. 2).

ACT Domain Structures
To date, the structures of at least 10 ACT domain-containing proteins have been determined (Fig. 3),and five of these have been solved with bound ligand. Even with this limited number of structures, they display a remarkable diversity in how the domains associate.
The structures range from stand-alone ACT domains, as seen for the YbeD protein from E. coli (14) (Fig. 3K), to large multimeric proteins with ACT domains either isolated or associating into groups of 2, 3, or 4 (6, 7, 10, 11, 14 -19, 21,  The sequences are aligned with regard to residue type and secondary structure. The upper group is a sampling of ACT domains using the alignment scheme presented by Aravind and Koonin (2). The middle group is a sampling of ACT domains that were classified as RAM domains by Ettema et al. (3) using the same alignment scheme as in the upper group. The lower group has the same proteins as in the middle group but uses the alignment scheme presented by Ettema et al. (3) for RAM domains. The residue numbers of the ACT domains within the protein sequence are given, and the asterisk denotes that the structure has been determined (see Fig. 3). The residues in bold designate those within the secondary structure elements as listed at the top of the figure. The color scheme generally follows that of Aravind and Koonin (2) and Ettema et al. (3) and refers to the following residue types: green, hydrophobic (ILVCAGMFYWTP); magenta, polar (HKREQDNST); gray, large (FILMWYKREQ); yellow, small (ACGSTDNVP); and red, conserved glycine or glycine next to ligand binding site. The protein abbreviations are: PGDH, D-3-phosphoglycerate dehydrogenase; IlvH, the regulatory subunit of acetohydroxyacid synthase; PheOH, phenylalanine hydroxylase; AspKi, aspartate kinase; Glycle, glycine cleavage system transcription regulator; ATPPRT, ATPphosphoribosyltransferase; NikR, NikR transcription regulator; YbeD, YbeD protein; YkoF, YkoF thiamine binding protein; ACR1, first ACT repeat of ACR protein 1;  22). The archetypical ACT domain association with two side-by-side domains forming an extended ␤ sheet is present in E. coli phosphoglycerate dehydrogenase (6) (Fig. 3A), aspartate kinase from Arabadopsis (22) (Fig. 3B), and the E. coli IlvH regulatory subunit of acetohydroxyacid synthase (18) (Fig. 3H). The Arabadopsis aspartate kinase also contains an ACT domain mimic in a side-by-side dimer arrangement (22). However, it cannot be considered a true ACT domain because it is not formed from a sequential fold. It also does not bind a ligand. It seems to play a purely structural role, being part of the dimer interface.
E. coli threonine deaminase is particularly interesting because it contains an ACT domain followed closely, within the same chain, by a ␤␣␤␤ fold that together forms a seven-stranded ␤ sheet (19). Mutagenesis studies (20) indicate that the effector ligand (isoleucine) binding site is located at the interface between these two domains in a position analogous to L-serine in phosphoglycerate dehydrogenase.
The rat phenylalanine hydroxylase (21) (Fig. 3G) structure is a dimer where the two ACT domains do not associate with each other. However, the protein in this case is lacking the C-terminal 24 residues that are responsible for tetramer formation. Thus, a tetrameric structure could eventually show a dimeric interaction of ACT domains similar to some of the other structures.
Proteins whose ACT domains associate with their ␤ strands in a more or less face-to-face arrangement include the Lrp-like transcription regulator (7), the NikR transcription regulator (10,15), the glycine cleavage system repressor protein (unpublished data, Protein Data Bank code 1U8S), ATP phosphoribosyltransferase (16,17), and the YkoF thiamine binding protein (11) (Fig. 3, J, D, I, C, and F, respectively). However, among these there is notable variation in the relative orientation of the domains. The ␤ strands of the Lrp regulator face each other with a diagonal arrangement whereas those of the NikR regulator display a parallel arrangement of strands. ATP phosphoribosyltransferase displays a face-to-face arrangement of three ACT domains rather than two.
Two of the proteins, the YkoF and the glycine cleavage regulator, possess both the face-to-face and the side-by-side domain arrangements. Both proteins consist of two tandem ACT domains in each of two chains. They form the side-by-side arrangement between ACT domains within the same polypeptide and a face-to-face arrangement across the subunit interface Within the proteins the subunits are colored gray, green, and aqua with the ACT domains of each subunit colored red, dark blue, and dark green, respectively. For the YkoF protein (F), the IlvH protein (H), and the glycine cleavage system protein (I), the tandem ACT domains in each subunit are colored red and dark blue. The association of the ACT domains is depicted in more detail for proteins A-E. The arrows illustrate their approximate location in the structure. The ␤ strands are in aqua and the helices in red. The ligands are depicted in CPK format. (Fig. 3, F and I). However, as can be seen in Fig. 3, they do not adopt the same type of overall structure.
The IlvH polypeptide also consists of two ␤␣␤␤␣␤ folds in tandem. The first of these, residues 2-75 from each chain, associate into a side-by-side extended sheet structure whereas the second, residues 81-150, forms a more face-to-face association. However, in this case the spacing is wider than in the face-to-face association in the Lrp protein (Fig. 3J), and the domains are offset so that their ␤ sheets are not directly across from each other. Furthermore, the sequence alignment of the second domain in the IlvH protein (residues 81-150) does not match either the ACT or RAM alignments as well as the first. Although the glycine residue between ␤1 and ␣1 is present in the IlvH second domain, it is not well conserved throughout other species (18), and there is no conserved glycine found between ␤2 and ␤3. These second sets of domains may have lost their ligand binding function in favor of forming higher order quaternary structure. The crystal lattice of the IlvH protein shows the dimers forming octamers through association of this second domain with magnesium ions.

Ligand Binding Sites
The five structures that have been solved with ligands indicate that the ligands tend to bind at the interfaces between ACT domains. There appears to be a correlation of the ligand binding sites with specific glycine residues located in loops between the helices and sheets. L-Serine binds to the phosphoglycerate dehydrogenase ACT domains primarily at the loop between ␤1 and ␣1. This location is also where L-lysine binds in aspartate kinase and has been determined by mutagenesis to be the probable location of L-valine binding to IlvH (18). The Arabadopsis aspartate kinase ACT domain also binds S-adenosylmethionine at a second site in the ␤3-␣2 loop that seems to act synergistically with the lysine. This is the only known case of an ACT domain binding more than one kind of ligand. These three proteins are also the only ones that exhibit the extended 8-stranded ␤ sheet structure and possess a conserved glycyl residue in the ␤1-␣1 loop region. Threonine deaminase, which has an ACT-like side-by-side dimer, also possesses a glycine residue in this location. The sequences identified as representing RAM domains (3) contain a nearly invariant glycyl residue in the loop between the ␤2 and ␤3 strands. This is the region that mutagenesis data suggest is involved in ligand binding in the Lrp-like regulator.
ATP-phosphoribosyltransferase contains neither glycyl residue and is the only available structure where the ACT domains form a triad arrangement. In this protein, the substrate, L-histidine, appears to associate mainly with the loop between the ␣1 helix and ␤2 strand. Interestingly, there is a glycyl residue found in this region that is positioned next to the bound histidine.
The YkoF structure is somewhat different in that it contains two successive ACT domains on the same polypeptide chain. Two of these chains associate to form a dimer that binds four ligand molecules. The main determinants for binding each ligand appear to be contributed largely by only one of the chains because there is little apparent hydrogen bonding of the ligand to the adjacent chain as is seen in most other ACT domain proteins. However, the adjacent chain does have a glycyl residue (Gly-88) in the loop between the ACT domains that is next to where two of the ligands bind. The second two ligand binding sites are opposite the C terminus of the adjacent chains. There is no glycyl residue present here, but there is also less structural obstruction at the terminus of the chain.
The NikR transcription regulator appears to be an exception. The nickel ion binds at the loop between ␤2 and ␤3, similar to the putative ligand binding location in the Lrp-like transcription regulator (RAM domain), but instead of having a glycyl residue at this location, the nickel chelates to the side chain of a histidine residue found in the analogous position.
Thus, there are at least four different ligand binding motifs that can potentially be discerned from the five available ligand-bound structures and the PSI-Blast sequence alignments. All but one involve the accommodation of enough space for the ligand to occupy the pocket by the absence of a side chain (Gly). The other is the only instance so far of a metal ion binding to an ACT domain by direct chelation with amino acid side chains.

ACT Domain Mechanism
The mechanism by which ACT domains translate the ligand binding event to the active site of enzymes or to the protein-DNA binding site in transcriptional regulators is unclear. Some metabolic enzymes containing ACT domains are inhibited by the ligands that bind whereas others are activated. Some show K-type allosteric effects whereas others display V-type effects. The uneven distribution of ACT domain proteins among and within species suggests that the ACT domain modules may have been recruited independently by each protein. If this is the case, do the ACT domains also function differently from protein to protein or is there a common mechanism?
Crystal structures of E. coli D-3-phosphoglycerate dehydrogenase (6,23) illustrate that a major effect of serine binding is the retardation of a rotation of the substrate binding domain relative to the nucleotide binding domain. The binding of phenylalanine to phenylalanine hydroxylase, along with phosphorylation of Ser-16, appears to stabilize the structure so that the binding pocket remains open to accept the substrate (21). Histidine binding to ATP-phosphoribosyltransferase synergistically enhances the inhibition of the enzyme by AMP and ADP and is associated with the conversion of a dimeric form to a hexamer (16,17). Stabilization of the hexamer by histidine results in the active sites being partially closed and inaccessible. Crystal structures of the NikR protein from E. coli (10) and Pyrococcus horikoshii (15) have led to the suggestion that nickel binding may serve to stabilize the structure, reducing its flexibility, and thus may promote DNA binding.
If there is a common theme in the regulation of ACT domain proteins, it may be that ligand binding reduces the flexibility of the structures. However, ACT domain proteins are only beginning to be studied in detail, and there is not yet enough information available.
An apparent third type of ACT domain proteins are those that consist of only the ACT domain, either as stand alone proteins or with multiple copies. Examples of these are the YbeD protein and the YkoF thiamine binding protein. One possible function, as has been postulated for YkoF, is that they may function as ligand transporters. Alternatively, they may function to sequester the ligands until their intracellular levels fall to a point that they are needed.

Summary
ACT domains are found in a wide variety of proteins and in a variety of different arrangements. The majority of these bind amino acids that function to regulate some aspect of amino acid metabolism. Most ACT domain-containing proteins have been discovered in Bacteria and Archaea, indicating that the domain appeared very early on. At least one functional example has been found in a mammalian protein, rat phenylalanine hydroxylase. Interestingly, ACT domains are not found in all organisms nor are they used consistently within species or found in all proteins possessing a common activity. In their original article, Aravind and Koonin (2) suggested that the ACT domain represents a "conserved, evolutionarily mobile module" that when fused to other proteins made them susceptible to regulation by small molecules. The large variety of ACT domain-containing proteins is consistent with this idea, but they do not appear to be used universally within a single species. The preponderance of ACT domains that bind amino acids suggests that this was its original function. However, there are obvious examples that it is evolving to bind other ligands and perhaps participate in the regulation of other processes. One should anticipate that the realm of the ACT domain as a regulatory element will continue to expand as more structures are determined and more is learned about their respective functions. What was once just a very short story is developing into a drama with many ACTs.