Crystal Structure of the HNF4 (cid:1) Ligand Binding Domain in Complex with Endogenous Fatty Acid Ligand*

HNF4 (cid:1) is an orphan member of the nuclear receptor family with prominent functions in liver, gut, kidney and pancreatic (cid:2) cells. We have solved the x-ray crystal structure of the HNF4 (cid:1) ligand binding domain, which adopts a canonical fold. Two conformational states are present within each homodimer: an open form with (cid:1) helix 12 ( (cid:1) 12) extended and collinear with (cid:1) 10 and a closed form with (cid:1) 12 folded against the body of the domain. Although the protein was crystallized without added ligands, the ligand binding pockets of both closed and open forms contain fatty acids. The carboxylic acid headgroup of the fatty acid ion pairs with the guanidinium group of Arg 226 at one end of the ligand binding pocket, while the aliphatic chain fills a long, narrow channel that is lined with hydrophobic residues. These findings suggest that fatty acids are endogenous ligands for HNF4 (cid:1) and establish a framework for understanding how HNF4 (cid:1) activity is enhanced by ligand binding and diminished by MODY1 mutations. Nuclear receptors are ligand-activated transcription factors that regulate such diverse physiological processes as reproduc-tion, development, and metabolism. Ligand binding induces conformational changes that coordinately dissociates corepres-sors and recruits coactivators to enhance transcriptional activity (1, 2). Physiologically relevant ligands, which are known for less than half of Data Collection and Structure Determination— Diffraction data were collected at 100 K at the National Synchotron Light Source, Brookhaven, NY (beamline X12C). Oscillation images were collected every 1°, and the data were integrated and scaled using the DENZO HKL software package (10). The structure was determined using MAD phasing of SeMet-substituted protein. The expected bijvoet and disper-* NJ

HNF4␣ is an orphan member of the nuclear receptor family with prominent functions in liver, gut, kidney and pancreatic ␤ cells. We have solved the x-ray crystal structure of the HNF4␣ ligand binding domain, which adopts a canonical fold. Two conformational states are present within each homodimer: an open form with ␣ helix 12 (␣12) extended and collinear with ␣10 and a closed form with ␣12 folded against the body of the domain. Although the protein was crystallized without added ligands, the ligand binding pockets of both closed and open forms contain fatty acids. The carboxylic acid headgroup of the fatty acid ion pairs with the guanidinium group of Arg 226 at one end of the ligand binding pocket, while the aliphatic chain fills a long, narrow channel that is lined with hydrophobic residues. These findings suggest that fatty acids are endogenous ligands for HNF4␣ and establish a framework for understanding how HNF4␣ activity is enhanced by ligand binding and diminished by MODY1 mutations.
Nuclear receptors are ligand-activated transcription factors that regulate such diverse physiological processes as reproduction, development, and metabolism. Ligand binding induces conformational changes that coordinately dissociates corepressors and recruits coactivators to enhance transcriptional activity (1,2). Physiologically relevant ligands, which are known for less than half of the 48 nuclear receptors encoded by the human genome, include the steroid hormones, retinoids, thyroid hormone, vitamin D 3 , and fatty acids. HNF4␣ is considered to be an orphan member of the nuclear receptor superfamily because its endogenous ligand is not known (3). Originally identified in liver, HNF4␣ is also present in kidney, gut, and pancreatic islets. It functions in transcriptional cascades, downstream of TGF-␤-activated SMAD signaling and GATA and HNF3/forkhead family transcription factors and upstream of the HNF1␣/ POU homeodomain transcription factor. In addition to HNF1␣, the many target genes for HNF4␣ in liver include coagulation factors and proteins involved in lipid and cholesterol metabolism and transport. Selective targets present in kidney and intestine include erythropoietin and intestinal fatty acid binding protein, respectively. Targeted disruption of Hnf4␣ in mice leads to defective gastrulation, underscoring an important role for HNF4␣ in the developing gut (4). In humans, the clinically apparent phenotype associated with Hnf4␣ mutations is a Mendelian form of diabetes mellitus associated with abnormalities in lipoprotein and lipid concentrations (5,6).
Positional cloning methods revealed the relationship between HNF4␣ and an atypical form of diabetes referred to as maturity onset diabetes of the young (MODY1) (5). Mutations in at least six distinct genes have been linked to MODY (7). Although MODY1 is rare (Ͻ20 families have been identified) (5,8,9), the associated phenotype is instructive. Like other forms of MODY, MODY1 is characterized by autosomal-dominant inheritance, early onset (Ͻ25 years of age), and abnormal pancreatic ␤ cell function. Because reduced ␤ cell function leads to insulin deficiency, it is reasonable to conclude that HNF4␣ activity is tightly coupled with insulin synthesis and secretion. Since nuclear receptors are an established and valuable class of drug targets, it follows that selective agonists for HNF4␣ might be useful as new therapeutic strategies for improving insulin secretion in MODY patients and potentially others with more prevalent forms of type 2 diabetes. To learn more about how HNF4␣ functions in health and disease, and as an important step in drug discovery, we have crystallized the ligand binding domain (LBD) 1 of HNF4␣ and solved its three-dimensional structure. HNF4␣ forms an antiparallel, three-layered ␣ helical sandwich akin to other nuclear receptor LBDs. The serendipitous presence of a tightly associated fatty acid in the binding pocket reveals potential mechanisms for ligand binding.

MATERIALS AND METHODS
Protein Production and Crystallization-DNA encoding the LBD of rat HNF4␣ (residues 133-382) was subcloned into the pET 28a vector (Novagen) by PCR. Protein was expressed in Escherichia coli BL21(DE3) (Invitrogen), grown either in LB or synthetic medium containing L-selenomethionine (SeMet), and isolated from bacterial lysates using Talon cobalt affinity resin (Clontech). The His 6 affinity tag was removed with bovine thrombin (10 units/ml), and the protein was further purified by ion exchange chromatography (Mono Q fast protein liquid chromatography). Crystals were obtained at room temperature by the vapor diffusion method in 10 l of sitting drops containing equal volumes of protein (25 mg/ml) and crystallization buffer (0.1 M sodium citrate (pH 8.0), 0.7 M ammonium acetate, 16% MPD (2-methyl-2,4pentanediol), and 10 mM dithiothreitol).
Data Collection and Structure Determination-Diffraction data were collected at 100 K at the National Synchotron Light Source, Brookhaven, NY (beamline X12C). Oscillation images were collected every 1°, and the data were integrated and scaled using the DENZO HKL software package (10). The structure was determined using MAD phasing of SeMet-substituted protein. The expected bijvoet and disper-* This work was supported in part by National Institutes of Health Grant R01 DK43123 (to S. E. S.). The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
The ʈ To whom correspondence should be addressed. E-mail: Steven. Shoelson@joslin.harvard.edu. sive differences (11) based on six bound seleniums in a monomer (250 residues) were 5.90 and 3.64%, respectively. The program SOLVE (12) found 20 of 24 possible selenium sites. The phases were improved by 4-fold averaging and solvent flattening utilizing the programs ML-PHARE and DM in the CCP4 suite (13). The model was built using O (14) and refined with CNS (15), applying restrained 4-fold non-crystallographic symmetry, overall B-factor corrections, and bulk solvent corrections (Table I).
Fatty Acid Analyses-HNF4␣ samples (200 mg) were combined with a 10-fold excess (v:v) of 15% NaOH in 50% methanol and heated for 30 min at 100°C. Esterification was accomplished by combining the cooled solutions with 20 volumes of 46% methanol in 6 N HCl and heating for 10 min at 80°C. The fatty acid methyl esters were extracted into 50% methyl t-butyl ether in hexane and analyzed by gas chromatography (GC) (Sherlock MIS; 170 -270°C) at the Bacterial Identification and Fatty Acid Laboratory, University of Florida (plantpath.ifas.ufl. edu/fame).

RESULTS AND DISCUSSION
Domain Architecture and Structural Homology-The aminoand carboxyl-terminal boundaries of the LBD of HNF4␣ were delineated from sequence comparisons with other members of the nuclear receptor family. Although we attempted to express longer versions, including the entire LBD plus F-domain (133-455), these degraded to shorter fragments. Since susceptibility to proteolysis suggested that the F-domain is at least partially disordered, we expressed and crystallized the LBD domain (133-382) by itself. Crystals belonging to the tetragonal space group P4 1 2 1 2 (unit cell: a ϭ b ϭ 102.3, c ϭ 227.7 Å) reached maximal dimensions of 0.3 ϫ 0.2 ϫ 0.2 mm in sitting drops within 5 days. The structure was solved using MAD phasing of selenomethionine-substituted protein (Table I).
Four protein molecules are assembled in the asymmetric unit as a pair of identical homodimers. Each LBD of HNF4␣ contains 9 -10 ␣ helices and two ␤ strands that adopt the helical sandwich motif common to the LBDs of nuclear receptors (Fig. 1). 194 core ␣ carbons of the ligand binding domains from HNF4␣ and RXR␣ (Protein Data Bank number 1FBY) (16), a close structural relative, superimpose with a root mean square deviation of 1.26 Å. Following conventional nomenclature, ␣ helix 2 (␣2), which is variably present in nuclear receptor LBDs, is absent in HNF4␣, ␣6 consists of a single ␣ helical turn, and ␣10 and ␣11 are contiguous. Pro 333 forces a break in ␣10. Acidic residues Glu 261 and Asp 262 create a bulge in ␣5 that may be important for dimerization (17). Electron density is apparent for all residues except those at the amino (133-140) and carboxyl (368 -382) termini of the domain and within the loop between ␣ helices 1 and 3 (the 1/3 loop, residues 157-165).
Fatty Acids in the Ligand Binding Pocket of HNF4␣-Although we did not add potential ligands to our crystallization trials, the initial electron density maps showed density in the ligand binding pocket of the domain that was not accounted for by the protein. The excess electron density, present in all molecules of the asymmetric unit, resembled a small molecule associated with the guanidinium group of Arg 226 . Matrix-assisted laser desorption ionization and time of flight mass spectrometric analyses of HNF4␣ revealed, in addition to the expected parent ion for the expressed protein, the presence of non-covalently associated saturated and monounsaturated fatty acids (data not shown). To confirm the identities of the lipids and quantify relative amounts, purified HNF4␣ solutions were submitted for fatty acid analysis by GC. Recombinant rat HNF4␣-(133-382) contained a mixture of fatty acids, including 16:0, 17:0 cyclo, 18:17c, and 14:0 (Fig. 2). Similar sets of fatty  Structure of HNF4␣ Bound to Fatty Acid 37974 acids bound a longer form of rat HNF4␣ containing the entire carboxyl-terminal F domain (residues 133-455, data not shown) and the LBD of human HNF4␣ (residues 140 -382), which differs from the rat domain at seven positions (Fig. 2). These findings demonstrate that both human and rat proteins associate spontaneously with endogenous fatty acids and that the F domain does not influence binding.
To determine whether the LBDs select specific fatty acids from the larger pool, we subjected E. coli BL21 cells to similar GC analyses (Fig. 2). The bacterial fatty acids present in greatest abundance also bound HNF4␣, although there appears to be a degree of selectivity. For example, the fatty acids 16:0, 18:17c, 16:1 iso/14:0 3OH, and 19:0 cyclo 8c are present in greater relative abundance in the bacteria than associated with human or rat HNF4␣ (Fig. 2) (16:1 iso and 14:0 3OH are not distinguished by GC). In contrast, fatty acids 17:0 cyclo, 16:1 7c/15 iso 2OH, 15 iso 3OH, and 10:0 were bound to the protein in higher relative abundance than their presence in E. coli. Other fatty acids such as 18:17c and 14:0 were neither selected for nor against (Fig. 2). Therefore, although a wide range of fatty acids were found within its ligand binding pocket, HNF4␣ appears to exhibit selectivity toward a subset of those found in bacteria. Future experiments with mammalian tissues are required to determine which fatty acids bind under relevant conditions and whether the capacity of HNF4␣ to serve as a fatty acid sensor has physiological or pathological ramifications.
The fact that the fatty acids were persistently bound even after a multistep purification indicated that they were tightly associated. This was borne out by additional attempts to strip the lipids from the protein, as the fatty acids remained associated after dialysis and gel-filtration chromatography (data not shown).
The serendipitous presence of fatty acids provided an opportunity to analyze how ligands bind to HNF4␣. The lipid carboxyl group was readily identified next to the side chain of Arg 226 and the methylene chain could be built to C12. Density beyond that point was weak, indicating that the remainder of the chain is either flexible or occupies multiple conformations. The cavity has an internal volume of 370 Å 3 , calculated by the program Voidoo (18), which is almost entirely occupied by fatty acid. While this volume is within the normal range for nuclear receptor LBDs, the elongated, relatively linear shape of the pocket is atypical. Both oxygens of the fatty acid head group are ion-paired with the guanidinium group of Arg 226 , and one oxygen forms additional hydrogen bonds with the backbone NH of Gly 237 and side chain OH of Ser 181 (Fig. 3, A and B (Fig. 3B). In the "closed" conformation ␣12 of HNF4␣ does not contact the ligand, in contrast with other nuclear receptors where ␣12 in the canonical agonist-bound conformation often does contact the agonist.
Although HNF4␣ and RXR␣ are similar in terms of amino acid sequence, their ligand binding specificities are distinct. This is due in part to Phe 313 in RXR␣, which points inward and directs bound 9-cis-retinoic acid upward toward ␣5 (19). The corresponding residue in HNF4␣ is Ala 223 , which, presumably because of the decreased bulk of the side chain, allows the fatty acid ligand to curl downwards toward ␣7. It is thus particularly interesting to note that F313A substitution in RXR␣ leads to constitutive activation, due to the spontaneous binding of an endogenous fatty acid as we have seen for wild-type HNF4␣ (19). The fatty acid in the binding pocket of RXR␣ F313A adopts a U shape, and density is interpretable out to the C18 position. In contrast, the side chains from residues Met 182 and Met 342 fill the upper portion of the HNF4␣ ligand binding pocket to prevent the fatty acid from looping around to form a U, and we don't see density past the C12 position. These two methionine residues of HNF4␣ are unique among nuclear receptors.  HNF4␣-(140 -385), and E. coli BL21 cells, converted to the corresponding methyl esters, and analyzed by GC. The results are plotted as percentages of total and sorted according to relative abundance in bacteria. 12:0, 14:0, and 16:0 are saturated lauric, myristic, and palmitic acids, respectively; 16:17c is monounsaturated palmitoleic acid, cyclo refers to a cyclopropyl group, 2OH and 3OH have hydroxyl groups at the 2 or 3 positions, respectively, and iso has an extra methyl group on the penultimate carbon. Both fatty acids are listed when mixtures could not be resolved. Structure of HNF4␣ Bound to Fatty Acid 37975