Open Chromatin Profiling in Mice Livers Reveals Unique Chromatin Variations Induced by High Fat Diet*

Background: Metabolic diseases result from a combination of multiple genetic and environmental factors. Results: High fat diet leads to chromatin remodeling in mice liver tissue in a strain-dependent manner. Conclusion: Diet can induce changes in the epigenome thereby contributing to metabolic disease. Significance: Environmental factors can contribute to complex disease progression through modifications to chromatin. Metabolic diseases result from multiple genetic and environmental factors. We report here that one manner in which environmental factors can contribute to metabolic disease progression is through modification to chromatin. We demonstrate that high fat diet leads to chromatin remodeling in the livers of C57BL/6J mice, as compared with mice fed a control diet, and that these chromatin changes are associated with changes in gene expression. We further show that the regions of greatest variation in chromatin accessibility are targeted by liver transcription factors, including HNF4α, CCAAT/enhancer-binding protein α (CEBP/α), and FOXA1. Repeating the chromatin and gene expression profiling in another mouse strain, DBA/2J, revealed that the regions of greatest chromatin change are largely strain-specific and that integration of chromatin, gene expression, and genetic data can be used to characterize regulatory regions. Our data indicate dramatic changes in the epigenome due to diet and demonstrate strain-specific dynamics in chromatin remodeling.

Diet is a key environmental factor involved in the development of many metabolic diseases. Indeed, excess consumption of calories from fats and refined carbohydrates has been associated with the development of obesity, nonalcoholic fatty liver disease, and diabetes (1). One of the organs most affected by diet is the liver, which is a central site of many metabolic processes (2,3). Given that obesity, nonalcoholic fatty liver disease, and related metabolic disorders are major health concerns, understanding the genetic and environmental factors that drive metabolic diseases is of utmost importance.
In response to external stimuli such as nutrients, specific transcription factors (TFs) 3 are engaged to regulate the expression of genes (4). Regions that are bound by TFs typically undergo chromatin modification, with rearrangement of local nucleosomes and modifications of histones in chromatin. These epigenetic changes can facilitate TF-DNA interaction (5). The accessibility of chromatin to TFs is cell type-specific and highly context-dependent (6,7). As such, chromatin accessibility profiling has been utilized to identify key transcriptional programs that promote differentiation and to uncover regulatory sites that influence specific disease states such as type 1 diabetes (8,9).
There are recent studies providing a link between environment and epigenetic marks (10,11). For example, long term high fat (HF) diet can alter DNA methylation in the promoter of the mouse Mc4r gene and, intriguingly, the effect of this epigenetic modification is strain-specific (12). Furthermore, exercise has been shown to alter DNA methylation patterns in human adipose tissue (13). However, the mechanisms linking environment and chromatin structure remain unclear. We describe here how consumption of an HF diet leads to chromatin remodeling in the liver at regulatory regions of the genome in a strain-specific manner.

EXPERIMENTAL PROCEDURES
Animals-Four-to-six-week-old C57BL/6J (B6) and DBA/2J (D2) mice were obtained from The Jackson Laboratory and placed on either a high fat (Research Diets D12266B) or a control (Research Diets D12489B) diet for 8 weeks. Body fat percentage and body weight were tracked as described previously (14). The animal protocols for the study were approved by the Institutional Care and Use Committee (IACUC) at the UCLA and the City of Hope.
FAIRE-seq-After 8 weeks of feeding, mice were humanely euthanized, and livers were harvested. Formaldehyde-assisted isolation of regulatory elements (FAIRE) was performed as described previously (15). Isolated DNAs from two biological replicates in each condition and strain were barcoded and sequenced on the Illumina HiSeq 2500 to produce 100 ϫ 100-bp paired-end reads. Sequenced reads were aligned to the mouse genome (version mm9) using Bowtie2 with default options, except for the use of local alignment allowing for one mismatch in the seed sequence (16). Overall, we obtained 38 -55 million aligned reads for B6 livers and 45-49 million aligned reads for the D2 livers. To confirm that the variability observed is not due to decreased genome mappability for the D2 genome as compared with the B6 (mm9 reference) genome, we aligned the sequenced reads to both the B6 reference genome and the D2 sequenced genome (17). We observed similar alignment rates for both genomes, indicating that the variability observed is not a technical artifact (B6 genome: B6, 91-98% alignment; D2, 94 -98%; D2 genome: B6, 92-98%; D2, 87-96%).
Aligned reads were further filtered to exclude improperly paired reads and PCR duplicates. To identify FAIRE peaks (sites) from reads, F-seq was used with default parameters and 1000-bp feature length (18). We utilized the irreproducible discovery rate framework (19) to find reproducible peaks across replicates.
To find the most variable sites, the read density at each site was determined in control and HF livers and sites were ranked comparing the density of read counts in HF with control (HF/ control). HNF4␣ chromatin immunoprecipitation sequencing (ChIP-seq) sites from B6 livers (20) and CTCF ChIP-seq sites (Mouse ENCODE) from B6 livers were obtained and overlapped with the most variable sites (37).
To assess accessibility differences of TF-bound and TF-unbound sites, HNF4␣, CEBP/␣, FOXA1, and CTCF sites were overlapped with all B6 FAIRE sites, and read densities comparing control with HF were determined (HF/control). Bootstrapping was used to determine the mean of each group.
ChIP-seq-ChIP experiments were performed with an antihistone H3 lysine 4 monomethylation (H3K4me1) antibody (Abcam, ab8895) using standard ChIP and our published protocols (21). Sequencing libraries compatible with Illumina HiSeq 2500 technology were generated using Illumina protocols. We obtained 55 million reads for livers of C57BL/6J control and 71 million reads for C57BL/6J HF. Sequences were aligned to the mm9 reference genome as with FAIREseq reads. Read densities were determined for the most variable B6 chromatin sites accessible in HF liver (top 1000 ranked by -fold change HF/control), invariable regions (mid 1000), and most variable B6 chromatin sites in control liver (bottom 1000).
RNA-seq-RNAs were extracted from the same livers as those used for FAIRE-seq using TRIzol (Invitrogen). RNAs were depleted of ribosomal RNA (Epicenter Ribo-Zero TM magnetic kit, catalog number MRZH11124). Eluted RNAs were prepared for sequencing using Illumina protocols and sequenced on a HiSeq 2500 (Illumina) to generate 100 ϫ 100-bp paired-end reads. We obtained 40 -55 million reads for B6 livers and 50 -55 million reads for D2 livers. The reads were aligned to the mm9 reference genome with TopHat (22) using the RefSeq gene annotation for transcript reference. Transcript abundance was quantified by fragments per kilobase of exon per million fragments mapped (FPKM) using Cufflinks (23). Cuffdiff (24) was used to identify differentially expressed (DE) transcripts from the RefSeq gene annotation. To identify enriched networks, data were analyzed through the use of Ingenuity Pathway Analysis (IPA) (Ingenuity Systems).
To determine the statistical significance of proximity of variable FAIRE sites to DE genes, we randomly simulated the placement of 334 genes and 2000 variable sites across the genome 10,000 times. We then calculated the distances between the nearest gene site combinations within 200 kb. This analysis leads to an empirical p value for 121 genes and 2000 variable FAIRE sites within 200 kb of p Ͻ 1 ϫ 10 Ϫ4 .
De Novo Motif Discovery-Motif discovery was performed with the top 1000 most variable sites (HF/control read densities) using DME2 (25). For both strains, we performed motif discovery on the top 1000 HF-specific FAIRE peaks with 10,000 random selected regions of length 500 bp used as background in DME. To mitigate the impact of strain-specific sequence variants on motif discovery, we created a pseudo-D2 reference genome by substituting known D2 single nucleotide variants into the mm9 reference genome (B6 genome) and used this for D2 analysis. We compared the de novo motifs to known TRANSFAC (26) motifs using MatCompare (27) with Kullback-Leibler divergence, as described previously (28). The motifs were then ranked by relative error rate (average of false positive and false negative rates adjusted for size of foreground and background sequences), as described previously (29).
Reporter Assays-To assess the enhancer function of the Lpin1 enhancer locus, both the B6 locus variant and the D2 locus variant were amplified from B6 genomic DNA and D2 genomic DNA, respectively, with the following primers: 5Ј-GCTAGCCTCGAGGATATCGGCAGGTCTTTGTGAG-TTCAAGG-3Ј and 5Ј-AGGCCAGATCTTGATATCCCTC-CTTCCAAACTTTCCTCC-3Ј. The amplicons were inserted into the EcoRV locus of pGL4.23 (Promega), with In-Fusion reaction (Clontech). The two reporters and the control, pGL4.23 vector alone, were transfected into HepG2 cells with Lipofectamine 1000 according to the manufacturer's instructions. Transfected cells were analyzed for expression of luciferase with the luciferase assay system (Promega). Expression of luciferase was quantified by the Tecan system.

RESULTS
C57BL/6J (B6) mice were placed on HF diet or control diet for 8 weeks. We profiled chromatin accessibility using FAIREseq in livers of B6 mice fed HF or control diets and identified reproducible peaks (sites) between biological replicates (see "Experimental Procedures"). We identified 28,484 sites in control and 28,253 sites in HF livers. We examined read densities at FAIRE sites for each diet and found high correlations between biological replicates (control: r 2 ϭ 0.99; HF: r 2 ϭ 0.99). Overall, we found 37,164 union FAIRE sites genome-wide with approximately two-thirds of these common across diets (Fig. 1A).  (4) Androgen and estrogen metabolism (5) Propanoate metabolism (4) Phosphatidylinositol signaling system (9) Steroid hormone biosynthesis (5) Glycerolipid metabolism (5) Glycerolphospholipid metabolism (6) Adipocytokine signaling pathway (6) Tight junction (9) Insulin signaling pathway (9) Enriched Kegg Pathway Terms (number of genes) Given that open chromatin sites can be regulatory elements influencing gene expression (30), we examined the localization of variable sites relative to known genes. We isolated the top 1000 sites in terms of FAIRE-seq read density change for HF over control (Fig. 1B) and found these top sites to be within 10 kb of 666 RefSeq annotated genes. Notably, these genes are enriched for many metabolic pathways, including glycerolipid metabolism and insulin signaling pathway (Fig. 1C). We next addressed whether these variable regions may potentially serve as regulatory elements in cis to influence gene expression. Using RNA sequencing, we identified 334 DE genes (false discovery rate Ͻ0.1) in the same samples. Identification of enriched molecular networks with these DE genes indicated the top networks to be lipid metabolism, molecular transport, and small molecular biochemistry, consistent with these genes being influenced by an HF diet. We found 121 of

Diet-induced Chromatin Remodeling in Liver
HNF4α CTCF the DE genes proximal (within 200 kb, p Ͻ 1 ϫ 10 Ϫ4 ) to the most variable FAIRE sites (2000 sites: top 1000 and bottom 1000 HF/control) (Fig. 1D). These data suggest potential interactions between the variable FAIRE sites and expression of proximal genes. We further evaluated whether FAIRE sites identified by our analyses overlapped with liver expression quantitative trait loci (eQTL) single nucleotide polymorphisms (SNPs) from the Hybrid Mouse Diversity Panel (HMDP) (31). We found that 1492 of the ϳ37,000 total open chromatin sites overlap with 1845 eQTL SNPs. Filtering the eQTL analysis for DE genes, we found 35 genes associated with 95 unique eQTLs overlapping 78 FAIRE sites. Interestingly, one of these genes, Lpin1, is also one of the top genes up-regulated in B6 mice fed HF diets. Lpin1 produces a phosphatidate phosphatase enzyme known to function in the triglyceride synthesis pathway (32). The Lpin1 locus contains a variable FAIRE site upstream from the transcriptional start site, overlapping several eQTL SNPs associated with Lpin1 expression (Fig. 1E).
We next examined the most variable FAIRE sites for enriched motifs by performing de novo motif discovery (see "Experimental Procedures"). Many liver and developmental TF

Diet-induced Chromatin Remodeling in Liver
binding sites were enriched ( Fig. 2A) with the top motif being HNF4␣, a ligand-dependent TF required for liver-specific gene expression (33). Using HNF4␣ ChIP-seq data (20), we found that 736 of the top 1000 variable sites are bound by HNF4␣ (Fig.  2B). In comparison, CTCF bound to only 145 of the top 1000 variable sites. Given that HNF4␣ cooperatively binds to DNA with CEBP/␣ and FOXA1 in liver cells (20), we further explored the relationship between combinatorial binding of these factors and chromatin variation (Fig. 2C) 2D; p ϭ 2 ϫ 10 Ϫ16 ; t test). Similar behavior was observed for FOXA1-and CEBP/␣-bound sites, but not CTCF-bound sites (Fig. 2D, FOXA1 p ϭ 2 ϫ 10 Ϫ16 , CEBP/␣ p ϭ 2 ϫ 10 Ϫ16 , CTCF p ϭ 0.95). Interestingly, we found that FAIRE sites targeted by all three TFs (H/C/F) display the greatest accessibility change (Fig. 2D, p ϭ 2 ϫ 10 Ϫ16 ). Given the striking overlap between liver TF binding and regions of chromatin accessibility changes under HF diet, we wanted to further investigate this relationship using a more unbiased approach. H3K4me1 is a characteristic mark of distal regulatory elements (34), and genomic profiles of this modification have been used to identify enhancer elements (35). We profiled H3K4me1 in the same samples using ChIP-seq (Fig.  2C). We first examined the relationship between gene expression and enrichment of H3K4me1 at promoters of differentially expressed genes. Examining the FPKM values for differentially up-regulated and down-regulated genes in both HF and control samples revealed that down-regulated genes start out with lower expression in control cells than up-regulated genes (Fig.  3A), indicating that up-regulated genes are already active in control cells. Consistent with this, consideration of the H3K4me1 read density in promoters of up-regulated and down-regulated genes indicated that in general, up-regulated genes have greater H3K4me1 signals than down-regulated genes in control cells, and the H3K4me1 levels are increased in HF (Fig. 3B).
To examine the enrichment of H3K4me1 at variable FAIRE sites on the genome scale, we ranked all union FAIRE peaks for control and HF diet by the -fold change of HF over control and evaluated the FAIRE and H3K4me1 signal at these regions for both HF and control (Fig. 3C). Aggregate plots of the top 1000 regions, a set of 1000 peaks in the middle of the ranking, the bottom 1000, and a randomly selected set of 1000 regions reveal that the regions with the greatest change in accessibility are pre-marked with H3K4me1, and the most variable open chromatin sites become more enriched with H3K4me1 (Fig. 3C).
It has been previously reported that mice strains display heterogeneity in physiological responses to distinct diets (14,36). We therefore next examined whether diet-induced chromatin variation in the liver is strain-specific by performing FAIRE profiling in livers of D2 mice under HF and control diet. Similar to B6 mice, D2 mice fed an HF diet for 8 weeks weighed more than control D2, as has been described previously (14). Oil Red O staining of livers from B6 and D2 mice revealed increases in hepatic lipid deposition in both strains for mice fed an HF diet as compared with control (Fig. 4A). We found similar numbers of open chromatin sites across the genome for D2 control and HF fed livers (26,661 control sites; 24,504 HF sites; 33,843 union sites; 17,030 common sites). As with the B6 strain, we found dramatic chromatin variability due to diet for D2 mice with approximately two-thirds of sites in common across conditions. Strikingly, however, the regions of greatest variability for D2 were largely distinct from the regions of variability for B6 (Fig. 4B), as exemplified by the Serpina5 locus (Fig. 4C). We verified that differences in chromatin variation for B6 and D2 were not due to decreased genome mappability for D2 by repeating the analysis with a pseudo-D2 reference genome (see "Experimental Procedures"). We performed de novo motif discovery with the top 1000 variable sites in the D2 genome and found that the enriched motifs are mostly distinct from those in the B6 top 1000 variable regions (Fig. 4D). Out of the top 10 enriched motifs, only SMAD3 was also identified in the B6 most variable regions, indicating that the regulatory networks involved in response to HF diet in the liver are largely distinct for B6 and D2.
We next investigated the possibility that local DNA sequence variation contributes to strain-specific chromatin variability. We stratified FAIRE sites into the following groups: Group A, HF-specific sites in B6 alone; Group B, HF-specific sites common to B6 and D2; Group C, HF-specific sites in D2 alone; Group D, diet-independent sites for B6 not observed in D2; and Group E, diet-independent sites for D2 not observed in B6. Each group of sites was assessed for overlap with single nucleotide variants (SNVs) between the B6 and D2 genome builds and compared with random sites of equal size and coverage (Fig. 4E). As expected, Group D and Group E, which contain strain-specific FAIRE sites that are not affected by diet, are statistically enriched for genetic variation as compared with random sites. Interestingly, Group A and Group C are also enriched for genetic variation as compared with random sites. In total, within Group A and Group C, ϳ50% of strain-specific chromatin variation sites contain SNVs. Given that the other

Diet-induced Chromatin Remodeling in Liver
ϳ50% of the strain-specific chromatin variation sites do not contain SNVs, there are clearly other mechanisms for chromatin variation involved as well.
To begin to understand the contribution of sites that display strain-specific chromatin variability to the physiology of the cell, we utilized the Genome Regions Enrichment of Annotations Tool (GREAT) to predict function for cis-regulatory elements (38). The top Gene Ontology (GO) biological process for Group A sites is "Positive regulation of JAK-STAT cascade," and this prediction was unique to Group A. Interestingly, JAK-STAT signaling pathway has been shown to be involved in the development of liver fibrosis (39), a phenotype that differs between the two strains of mice (40). These data suggest that strain-specific chromatin sites can contribute to strain-specific phenotypes.
Given the enrichment of SNVs at strain-specific FAIRE sites, we asked whether these sites were enriched for liver eQTL SNPs from the HMDP. We started by considering the 487 genes that are differentially expressed in control diets between B6 and D2 livers. Of these 487 genes, 207 have eQTL SNPs within 1 Mb of the gene body. Of these 207 genes, 163 have genotypes that are dimorphic between B6 and D2, and of these, 44 genes have variable FAIRE sites overlapping dimorphic eQTL SNPs (Fig.  5A). Another 44 of these 207 genes have common genotypes between B6 and D2, and of these, only 3 have variable FAIRE sites overlapping eQTL SNPs (p ϭ 0.004; Fisher exact test), indicating that genes with variable genotypes are more likely to have variable chromatin accessibility profiles. One of the genes with variable genotype and chromatin accessibility between B6 and D2 is the regulator of G protein signaling 16, Rgs16 (Fig.  5B), which has three eQTL SNPs overlapping a D2 HF-specific FAIRE site ϳ30 kb upstream of the transcription start site. Regulators of G protein signaling (RGS) proteins help control hepatic lipid homeostasis, and Rgs16 has been shown to signal for glucose production for inhibition of fatty acid oxidation (41,42).
These results indicate that the integration of chromatin, gene expression, and SNP data can be used to characterize regulatory variants. To evaluate this, we chose a B6 HF-specific FAIRE site overlapping an eQTL ϳ40 kb upstream of the Lpin1 locus (Fig. 5C) and generated enhancer constructs to test the enhancer function of these sites. We found that although the D2 variant of this enhancer site can slightly increase expression of the luciferase reporter relative to control vector, the B6 genotype has 4-fold higher enhancer activity at this site (Fig. 5D).

DISCUSSION
Complex metabolic diseases such as diabetes, obesity, and nonalcoholic fatty liver disease result from multiple genetic and environmental factors. Although genetic studies have revealed many loci predisposing individuals to risk of these diseases, they explain a very small fraction of the total trait variance. Clearly, environmental factors and gene-by-environment factors are very important (14). We have demonstrated here that one mechanism whereby environmental factors can influence disease risk is through chromatin remodeling. We found that diet elicits changes in chromatin accessibility, and our results show that the regions undergoing the greatest chromatin remodeling in the livers of HF-fed mice are bound by liver regulatory factors, indicating a modulation of liver regulatory networks under HF diet. We further show that we can pinpoint specific loci, such as the FAIRE site upstream from Lpin1 that may function to specifically regulate Lpin1 in a diet-specific manner. Characterization of loci such as these is critical to understanding the mechanisms that contribute to gene regulation and ultimately the pathways involved in disease progression.
We have also demonstrated that the regions of chromatin remodeling in livers of mice subjected to HF diet are strainspecific. The fact that distinct sites of the B6 and D2 genome display variation under diet suggests that different networks may be engaged in response to diet in each strain. These data also uncover a genetic component in how the environment can impact disease progression and suggest the existence of genetic-epigenetic crosstalk. Recently, it has been shown that local sequence variation may account for ϳ30% of chromatin variation in erythroblasts from distinct inbred mice in a single condition (43). In comparison, our data indicate that up to 50% of chromatin variations may be attributed to local sequence variation. These differences may be attributed to the fact that our studies include additional sites influenced by diet. We have also examined the potential function of these sites of strain-specific chromatin variability and found that the JAK-STAT pathway may be positively regulated in the B6 mice under HF diet, but not for D2 mice under the same diet. Future studies addressing the role of these sites in the development of physiological differences between the two strains will be important to our understanding of chromatin variation and phenotype.
Finally, further studies into the role of the chromatin variations identified in this study and genetic variants will illuminate distinct relationships between DNA sequence and chromatin variation. Indeed, although limited by variation in human-bred mouse populations, we are able to show that specific FAIRE sites containing genetic variation are linked to specific genes such as Lpin1 and Rgs16. These results further elucidate regulatory mechanisms associated with metabolic disorders such as obesity and hepatic steatosis and could lead to the identification of novel therapeutic targets.