Two-tiered Approach Identifies a Network of Cancer and Liver Disease-related Genes Regulated by miR-122*

MicroRNAs function as important regulators of gene expression and are commonly linked to development, differentiation, and diseases such as cancer. To better understand their roles in various biological processes, identification of genes targeted by microRNAs is necessary. Although prediction tools have significantly helped with this task, experimental approaches are ultimately required for extensive target search and validation. We employed two independent yet complementary high throughput approaches to map a large set of mRNAs regulated by miR-122, a liver-specific microRNA implicated in regulation of fatty acid and cholesterol metabolism, hepatitis C infection, and hepatocellular carcinoma. The combination of luciferase reporter-based screening and shotgun proteomics resulted in the identification of 260 proteins significantly down-regulated in response to miR-122 in at least one method, 113 of which contain predicted miR-122 target sites. These proteins are enriched for functions associated with the cell cycle, differentiation, proliferation, and apoptosis. Among these miR-122-sensitive proteins, we identified a large group with strong connections to liver metabolism, diseases, and hepatocellular carcinoma. Additional analyses, including examination of consensus binding motifs for both miR-122 and target sequences, provide further insight into miR-122 function.


tional analyses, including examination of consensus binding motifs for both miR-122 and target sequences, provide further insight into miR-122 function.
MicroRNAs (miRNAs) 4 are small (20 -24 nt) endogenously expressed noncoding RNAs that regulate the translational efficiency and/or degradation of specific mRNAs. First discovered in Caenorhabditis elegans (1), miRNAs have since been identified in a diverse set of eukaryotic organisms as well as viruses, with over 700 miRNAs currently identified in humans (2). miRNAs function via the RNAi pathway, guiding the RNAinduced silencing complex to mRNAs by Watson-Crick base pairing between the miRNA and target mRNA (reviewed in Ref. 3). The resulting interaction leads to translational repression of the mRNA, although how this repression is achieved remains unclear. Several mechanisms have been proposed (reviewed in Ref. 4), and many questions remain; however, it is clear that sequence complementarity lies at the heart of miRNA function. In animals, sequence complementarity between an miRNA and its target mRNA is rarely perfect. The vast majority of binding sites contain mismatches between strands, and these mismatches have been shown to play an important functional role in target repression (5,6). Imperfect complementarity allows for a greater variability of target sequences, thus increasing the number of potential binding sites for a given miRNA, with estimates of 300 -400 targets on average per miRNA (7). Although many potential miRNA targets can be identified through predictions, no computational approach is all-inclusive, and even highly conservative predictions must be validated by experimental means to verify functional relevance.
Certain miRNAs have caught the attention of scientists due to their strong connections to diseases and cancer. Such is the case for miR-122, which has been implicated in liver-related diseases and hepatocellular carcinoma (HCC). miR-122 is highly abundant in the liver, accounting for 70% of total liver miRNA expression (8), and liver specificity seems to be conserved, at least from mouse to human (8,9). miR-122 has been associated with the regulation of liver metabolism as well as hepatitis C infection and is often down-regulated in HCCs (reviewed in Ref. 10). The use of antisense miR-122 down-regulated several genes implicated in liver metabolism and produced an increase in expression of hundreds of genes that are normally repressed in hepatocytes, suggesting that miR-122 functions as an important player in maintaining the liver phenotype (11,12). Moreover, silencing of miR-122 in high fat fed mice produced a reduction of hepatic steatosis, which can be linked to a reduction in cholesterol synthesis and stimulation of fatty acid oxidation (11).
miR-122 dysregulation has also shown a strong association with tumorigenesis. A reduction in miR-122 expression has been observed both in a rat model for HCC and in human HCC samples compared with pair-matched control tissues (13). Restoration of miR-122 expression in HCC cell lines impaired in vitro migration, anchorage-independent growth, invasion, angiogenesis, and intrahepatic metastasis (14). Similar findings have been obtained by other research groups (15,16). In addition, miR-122 has been shown to influence apoptosis; transfections of the hepatoma cell line Huh-7 with a miR-122 mimic produced an up-regulation in apoptosis levels as indicated by both flow cytometry and TUNEL assay (17).
Given the importance of miR-122 to proper liver function in health and disease, a more extensive knowledge of miR-122 targets would greatly improve our understanding of the function of this miRNA. To this end, we have developed a combined high throughput screen for miRNA-targeted genes that uses a luciferase-based assay and label-free quantitative proteomic mass spectrometry. Our combined approach revealed that miR-122 controls a network of genes with strong connections to liver metabolism, diseases, and cancer-related processes.

EXPERIMENTAL PROCEDURES
Human 3ЈUTR Luciferase Fusion Reporters-Human 3Ј-untranslated regions (3ЈUTRs) were systematically identified and cloned into an optimized luciferase reporter vector system (SwitchGear Genomics). The reporter protein contains a PEST protein degradation sequence that enables a more sensitive measure of repression. A total of 139 3ЈUTR luciferase fusion reporters were selected from this genome-wide collection based on the presence of one or more predicted miR-122 target sites as determined by the computational prediction sets identified by PicTar (18) and microCosm (formerly miRBase Targets) (2,19). An additional 16 reporters were used as negative controls, determined by a lack of predicted miR-122 target sites. This negative set included the empty vector, 11 constructs from the genome-wide collection, and 4 controls containing a random, nongenic, and nonconserved sequence in place of the 3ЈUTR. An additional 14 reporters were selected from the genome-wide collection for validation of putative miR-122 targets identified in the proteomics experiment.
Cell Growth, Transfection, and Luciferase Assays-96-Well plates were seeded with 6000 HT-1080 cells (ATCC) 18 -24 h before transfection to achieve 80% confluence at the time of transfection. Each transfection included 0.15 l of Dharma-FECT DUO transfection reagent (Dharmacon), 100 ng of 3ЈUTR reporter, and sufficient mimic or nontargeting control miRNA (Dharmacon) to yield a final concentration of 20 nM in a total volume of 100 l/well. Each construct was transfected in triplicate separately with either the miR-122 mimic or the nontargeting control. Plates were incubated at 37°C for 24 h posttransfection before being removed. 100 l of SteadyGlo luciferase assay reagent (Promega) was added to each well, and plates were incubated at room temperature for 30 min and finally read on a LmaxII-384 luminometer (Molecular Devices).
To identify which genes were significantly repressed, we calculated a p value (one-tail t test) and log 2 ratio for each reporter from the average luminescence values of the miR-122 mimic and nontargeting control transfections. We then established threshold values for significance based on the normal distribution of the negative control set, a conservative p value Ͻ 0.05 along with a minimum of a 1.5-fold repression.
Mutagenesis Studies-Mutations of seed sequences were generated using a modification of the QuikChange (Stratagene) protocol (20). miR-122 target sites were predicted in the cloned sequences of six miR-122-responsive 3ЈUTRs identified by experimental screens. Two or three nucleotides were mutated within a single target site in each 3ЈUTR reporter construct. After sequence confirmation, mutant reporters were tested along with wild-type controls in quadruplicate as described above. Luminescence values of wild-type and mutant constructs were compared using the one-tailed t test. The log 2 ratio of miR-122 mimic and nontargeting control for each wild-type and mutant construct pair was also calculated.
Proteomics Sample Preparation-HT1080 cells were transfected with 20 nM miR-122 mimic or mock transfection in T-25 flasks using the transfection reagent Dharmafect 4 (Dharmacon). Cells were harvested 24 h post-transfection and lysed by Dounce homogenization in low salt buffer (10 mM Tris-HCl, pH 8.0, 10 mM KCl, 1.5 mM MgCl 2 ) with protease inhibitor mixture (Calbiochem) and centrifuged at 1000 ϫ g to separate into crude soluble (cytosolic) and insoluble (nuclear) fractions. Nuclear fractions were washed once and resuspended in low salt buffer, at which point both fractions were treated the same. For each sample, 2,2,2-trifluoroethanol was added to 50% (v/v). Samples were reduced with 15 mM DTT at 55°C for 45 min and then alkylated with 55 mM iodoacetamide at room temperature for 30 min. Following alkylation, samples were diluted in digestion buffer (50 mM Tris-HCl, pH 8.0, 2 mM CaCl 2 ) to a final 2,2,2-trifluoroethanol concentration of 5% (v/v). Proteomics grade trypsin (Sigma) was added to a 1:50 (enzyme/protein) concentration, and samples were digested at 37°C for 5 h. The digestion was quenched with 1% formic acid, and sample volume was reduced to 20 l by SpeedVac centrifugation. Digested peptides were bound and washed on HyperSep C-18 SpinTips (Thermo), resuspended in peptide buffer (95% H 2 O, 5% acetonitrile, 0.1% formic acid), and filtered through Microcon 10-kDa centrifugal filter (Millipore), with the digested peptides collected as flow-through.
Peptides were separated on a Zorbax reverse-phase C-18 column (Agilent) using a 5-38% acetonitrile gradient over 230 min and analyzed on line by nanoelectrospray-ionization tandem mass spectrometry on an LTQ-Orbitrap (Thermo Scientific). Data-dependent ion selection was activated, with parent ion (MS1) scans collected at high resolution (100,000). Ions with charge Ͼϩ1 were selected for collision-induced dissociation fragmentation spectrum acquisition (MS2) in the LTQ, with a maximum of 12 MS2 scans per MS1. Dynamic exclusion was activated, with a 45-s exclusion time for ions selected more than twice in a 30-s window.
Proteomics Data Analysis-Spectra were searched against the Ensembl release 54 (21) human protein-coding sequence database using Sequest (Bioworks version 3.3.1, Thermo Scientific). A 1% false discovery rate was determined against a reversed concatenated decoy database, with specific filters (i.e. ⌬CN, XCorr) selected to maximize the number of protein identifications in the forward database while maintaining the percentage of reversed identifications at 1% of the total. The protein list was curated by collapsing into groups the proteins for which there was identical evidence of observation and by removing the proteins for which observed peptides could be accounted for by other proteins with additional unique observations. Preference was given to sequences with predicted miR-122 seed sites in the 3ЈUTR of the associated mRNA transcript. Differential expression of proteins across miR-122-treated and control samples was calculated from the spectral count based on the APEX method of quantitation (22). Proteins with a Z-score Ͼ1.65 (one-tailed) were considered significantly down-regulated in miR-122-treated samples.
Cell Culture, Transfection, and Western Blot Analysis in Liver Cell Lines-Huh7 cells were obtained from American Type Tissue Collection and maintained in Dulbecco's modified Eagle's medium (DMEM) containing 10% FBS and 2% penicillin/ streptomycin.
Huh7 cells were transfected with 40 nM miRIDIAN miRNA mimics (miR-122) (Dharmacon) utilizing Oligofectamine (Invitrogen). All control samples were treated with an equal concentration of nontargeting control mimic sequence to control for nonsequence-specific effects in miRNA experiments.
Cell were lysed in ice-cold buffer containing 50 mM Tris-HCl, pH 7.5, 125 mM NaCl, 1% Nonidet P-40, 5.3 mM NaF, 1.5 mM NaP, and 1 mM orthovanadate, 175 mg/ml octyl glucopyranoside, and 1 mg/ml protease inhibitor mixture (Roche Applied Science) and 0.25 mg/ml 4-(2-aminoethyl)benzenesulfonyl fluoride (Roche Applied Science). Cell lysates were rotated at 4°C for 1 h before the insoluble material was removed by centrifugation at 12,000 ϫ g for 10 min. After normalizing for equal protein concentration, cell lysates were resuspended in SDS sample buffer before separation by SDS-PAGE. Proteins were transferred onto nitrocellulose membranes and probed with the indicated antibodies. Protein bands were visualized using the Odyssey Infrared Imaging System (LI-COR Biotechnology). Densitometry analysis of the gels was carried out using ImageJ software from the National Institutes of Health (rsbweb.nih.gov).
Comparison of Experimental Results with HCC Microarray Data-Correlations between expression of miR-122 and predicted targets were analyzed using miRNA and mRNA profiling data from a cohort of 94 HCC patients as described (23,24). Correlation of each of the predicted targets was evaluated using Pearson correlation analysis.
Identification of miR-122-binding Motif-The miR-122binding motif was investigated in luciferase and proteomics data separately. Genes identified as direct targets were selected for binding site analysis. Putative target sites were identified by the presence of a 7-mer or greater seed complementary site. Ensembl (release 58) and RefSeq (GRCh37/hg19 assembly) 3ЈUTR sequences were both analyzed, with the sequence containing the greatest number of predicted target sites selected for the analysis. For each binding site identified by a 7-mer seed, a stretch of 40 bp from the 5Ј of the site was extracted. The miR-122-binding secondary structure with the 40-bp site was then generated by RNAduplex (25). Because miRNA binding is imperfect, the motif was predicted in terms of probability as regards the likelihood that each nucleotide in the miRNA sequence binds to the target site. An miR-122-binding matrix was constructed with a value of 1 for the ij-th element if the jth nucleotide of miR-122 paired to the i-th site, and a value of 0 otherwise. Based on the binding matrix, the empirical probability of binding was obtained for each miR-122 nucleotide. For each miR-122 nucleotide, the pairing nucleotides in each binding site were tallied, and the empirical frequency of corresponding perfectly paired nucleotides or a bulge or mismatch was then calculated. Next, the average frequency of a bulge or mismatch between every adjacent perfectly paired site was determined. To compare the similarity of motifs derived independently from luciferase and proteomics datasets, the KL distance between the two motif-binding probabilities was calculated as shown in Equation 1, where P(i) is the binding probability of the i-th nucleotide of miRNA-122 obtained from the luciferase dataset, and Q(i) is the corresponding probability obtained from the proteomic dataset.
Target Site Accessibility Prediction for miRNA Targets-The computational prediction of miRNA targets relied upon a set of computer programs, including Target, SigStb, SegFold, and Scanfd (26 -28). We initially used Target to search for putative target regions containing complementary sequences with an miR-122 seed sequence (P2-P8) in which only one wobble base pair G:U or U:G was allowed in P3-P8. A target site accessibility computation was then performed to eliminate unfavorable sites, based on the hypothesis that a favorable target site should have an unstable folding region, have high target accessibility of the seed sequence, and have a distinct RNA secondary structure in the region immediately downstream of the complementary seed sequence.
To calculate the thermodynamic stability of the miRNA target site, SigStb was employed to compute a smoothly moving average stability score (Stbscr) for the region Ϯ50 nts from first position of the predicted seed sequence (P1), using a 50-nt sliding window. Stbscr is defined as a standard Z-score, Stbscr ϭ (E Ϫ E w )/SDT w , where E is the lowest free energy computed from a local segment of 50-nt, and E w and STD w are the sample mean Ϯ S.D. of the lowest free energy computed by folding all segments of the same size that are generated by taking successive overlapping segments of 50 nt stepped 1 nt at a time from the start to end positions of the sequences.
The target accessibility of a seed sequence was measured by the hybridization energy E h of base pairing between the miRNA seed and the complementary seed sequence of the targeted mRNA. E h ϭ E 1 Ϫ E cost was computed by Scanfd, where E 1 is the energy contribution from the entire base-paired region between the miRNA seed and the complementary seed sequence; E cost is the energy cost of opening the complementary seed sequence into a single-stranded state in the local folding region Ϯ40 nts around P1.
The distinct RNA secondary structures found in the flanking regions of the computed target sequence were characterized by SegFold and Scanfd. First, a significance score and stability score for each overlapping segment were computed by sliding a fixed length window 1 nt at a time along the complete 3ЈUTR sequence from the 5Ј to 3Ј end using the program SegFold. Using a Monte Carlo simulation, the window size was systematically increased from 40 to 100 nt by a 2-nt step. 100 randomly shuffled sequences were generated for each overlapped wildtype sequence, and the lowest free energies of each overlapped segment were calculated for both wild-type and random sequences. For each sequence, the most significant unusual folding regions in the target site flanking regions were selected. Finally, the corresponding RNA secondary structures of the unusual folding regions were computed by Scanfd.
Gene Ontology and Network Analyses-An in-depth literature search was performed in Pathway Studio 6 by the "add neighbors" algorithm to identify cell processes enriched among significantly repressed genes in the luciferase dataset. Significance was determined by Fisher's exact test. A similar analysis was performed on targets from the proteomics dataset containing an miR-122 seed complementary sequence in the 3ЈUTR. The add neighbors algorithm in Pathway Studio 6 was again used to identify targets that are associated with liver-related diseases. This analysis was performed for the combined luciferase and proteomics dataset with targets containing miR-122 seed sequence and also for the combined luciferase and the entire proteomics dataset.
The "add direct interactions" algorithm in Pathway Studio 6 was used to create a network of miR-122 predicted targets that have "expression" and/or "regulation" relations among other targets in the combined luciferase and proteomics set. Another kind of network was created by the "add common targets" algo-rithm that identifies nodes that have high connectivity to the genes in the dataset. The top 15 nodes showing high connectivity to the miR-122 predicted targets in the combined dataset were selected.

RESULTS
Identification and validation of miRNA targets can be a complicated task. There are several alternatives available starting with numerous target prediction methods and continuing with different biological procedures that encompass reporter-based screenings, shotgun proteomics, and Ago2-based immunoprecipitation methods (29 -33). We have conducted two independent high throughput approaches as follows: a luciferasereporter based screen and a quantitative shotgun proteomics analysis to identify a large set of genes under the influence of miR-122. These two approaches are very distinct in nature; the luciferase screen focuses on a list of genes derived from computational predictions, whereas the proteomics approach is open-ended. Datasets from both analyses were subsequently compared and analyzed with bioinformatics tools to determine particular connections between identified mRNA targets and specific biological processes.
Luciferase-based Strategy to Map miR-122 Targets-A genome-wide clone collection of luciferase reporters containing human 3ЈUTRs was used to conduct a screen of 139 computationally predicted miR-122 target genes to identify experimentally responsive targets. Target predictions were obtained from two commonly used sources, the PicTar (18) and MicroCosm (2,19). Individual luciferase reporter constructs were employed in co-transfection experiments with 20 nM miR-122 mimic or a nontargeting mimic control in the HT1080 fibrosarcoma cell line. Thresholds of p Ͻ 0.05 (one-tailed t test) and down-regulation greater than 1.5-fold were determined from a set of negative controls to define the statistically significant subset of miR-122-responsive targets (Ͼ95% confidence) (Fig. 1).
Luciferase-based screening to identify miR-122 targets. Switch-Gear luciferase reporter constructs containing 3ЈUTRs of 139 genes predicted to be miR-122 target sites (blue ࡗ) were assayed for repression in response to miR-122 treatment. Negative controls (red f) consisted of 11 3ЈUTRs from human genes lacking predicted miR-122 target sites and five scrambled sequence controls. An additional 14 genes (yellow OE) identified as putative targets in the proteomic analysis were also assayed. Constructs exhibiting greater than 1.5-fold repression (log 2 Ϫ ratio Ͻ Ϫ0.58) and a p value Ͻ0.05 (log 10 (p value) Ͻ Ϫ1.3) (miR-122-treated versus control) were considered significantly down-regulated (bottom-left quadrant).
This repressed subset contains 37 of the 139 predicted targets screened (27%), with 24 of 37 (65%) repressed Ͼ2-fold. The complete list of results can be found in supplemental Table S1.
The use of multiple computational predictions for the selection of our test set allowed us to examine how each algorithm performed in the luciferase assay. In addition to PicTar and MicroCosm predictions, we also considered predictions from TargetScan 5.1 (7). Although TargetScan was not used in the initial selection of predicted targets, several genes in our test set overlap with TargetScan predictions. To best evaluate prediction methods, we considered each method independently, as well as in combination for genes predicted by more than one method. Of the 139 genes in the predicted target set, 85 were identified by TargetScan (27 if only conserved sites are considered), 54 by PicTar, and 101 by MicroCosm. The results, summarized in Table 1, indicate that TargetScan predictions (with and without conservation included) were the most likely to be validated as targets of miR-122 in the luciferase assay, and predictions by multiple algorithms increased the likelihood of repression, and genes predicted by MicroCosm alone performed quite poorly in our assay.
Mapping of miR-122 Targets by Shotgun Proteomics-For our second independent approach to identify miR-122 targets, we turned to mass spectrometry-based quantitative proteomics. Recent studies by Selbach et al. (33) and Baek et al. (29) examined miRNA-induced changes in protein levels by MS, using the isotopic labeling technique SILAC to quantify protein abundance. These papers report measurable repression in both mRNA and protein levels; however, the effect was consistently greater at the protein level, emphasizing the importance of measuring changes in cellular protein abundance. For this study, we adapted the APEX method of label-free protein quantitation by mass spectrometry (22).
For proteomics experiments, HT1080 cells were also used to maintain consistency with the luciferase assay. Cells treated with 20 nM miR-122 mimic or mock transfection for 24 h were lysed and split into cytosolic and nuclear fractions, with each analyzed independently across three replicate samples. In total, 2422 proteins were observed in at least two replicates, with 1704 and 1903 proteins observed in the cytosolic and nuclear fraction, respectively. 271 proteins were significantly repressed in at least one fraction of the miR-122-treated samples (Z-score Ͼ1.65, fold-change Ͼ1.3). After discarding proteins for which apparent miR-122-induced down-regulation in one fraction was contradicted by an increase in the other, we arrived at a final list of putative miR-122 targets containing 226 proteins identified as significantly down-regulated across the total cellular pool, i.e. in both fractions (supplemental Table S2).
As a first step toward confirming the putative targets, we examined the significantly down-regulated set for features of miR-122 targeting. The simplest feature of most miRNA-binding sites is the "seed" site, encoding a sequence in the mRNA 3ЈUTR that is perfectly complementary to nucleotides 2-7 at the 5Ј-end of the mature miRNA guide strand. To decrease false-positive predictions, we used a comparatively strict definition of the seed site, requiring at least seven sequential matches complementary to positions 1-7 or 2-8 of the miRNA. 75 of the 226 identified targets contained a 7-mer or greater seed complementary sequence in the 3ЈUTR. Downregulated proteins containing at least one seed site showed a 2-fold enrichment over the total distribution of seeds. This enrichment increased to 3.8-fold for genes containing the even stricter 8-mer seed match, consistent with previous studies showing a greater repressive effect for 8-mers over other seed lengths ( Fig. 2A) (34 -36). To further support the claim that our down-regulated set is enriched for direct targets, we mapped conserved miR-122 target predictions from Target-Scan 5.1 onto our dataset. Of the 124 conserved targets in the TargetScan database, 19 were mapped to the 2422 proteins, with 10 of the 19 (53%) found in the down-regulated set, a 5.5-fold enrichment (Fig. 2B). Including nonconserved Tar-getScan, predictions showed little improvement over the 7-8-mer seed prediction. Together, these results indicate that the down-regulated set is significantly enriched for real miR-122 targets.
Next, we looked to the literature for miR-122 targets that have been experimentally validated at the mRNA and/or protein level, providing a set of positive controls. Indeed, we observed several of these validated targets in the down-regulated set of proteins, including GYS1 and aldolase A (11). We also observed down-regulation of vimentin, which is commonly up-regulated in cancers, including HCC, and was shown to decrease significantly in HCC cells upon miR-122 expression (14,16). Vimentin is a marker for mesenchymal cells and is positively associated with invasiveness and metastatic potential (37)(38)(39). Although not previously identified as a direct target, the vimentin 3ЈUTR does contain a 7-mer-m8 seed site, suggesting it may indeed be a direct miR-122 target.
Two additional proteins identified in our down-regulated set, citrate synthase and IQ-motif-containing GTPase-activating protein 1 (IQGAP1), have been previously implicated as miR-122 targets (7,18,40). IQGAP1 is particularly intriguing, as it was recently identified along with vimentin as a factor in hepatotumorigenesis (Ref. 41 and reviewed in Ref. 42). These examples demonstrate the effectiveness of our experimental approach toward identifying a subset of proteins enriched in miR-122 targets.
Comparison of Luciferase and Proteomic Data Sets-Following the initial screen, we selected an additional 14 miR-122 target genes identified through the proteomic analysis and evaluated their 3ЈUTR response to miR-122 by the luciferase reporter assay. The results showed a very good correlation between the two methods employed; of the 14 tested, 7 genes exhibited significant down-regulation, and an additional 3 were significantly repressed with p values Ͻ 0.05, but the change in expression did not surpass the 1.5-fold repression we established as the cutoff for significance. Combining the 7 new targets with 3 genes identified in the initial screening, a total of 10 proteins were validated as high confidence direct targets with significant down-regulation in both analyses ( Table 2). All 10 genes contain predicted target sites in their 3ЈUTRs; however, only ALDOA, CS, and IQGAP1 have been previously validated as direct miR-122 targets. Furthermore, all previous validations of these three targets have been carried out in mice (11,40,43), making this the first validation of these three targets in a human cell line. An additional 34 targets were significantly down-regulated in the luciferase analysis, although the proteomics revealed 65 additional direct targets (containing a 7-mer or greater miR-122 seed site) and 151 indirect targets (lacking a 7-mer miR-122 seed site) in the down-regulated set (Fig. 3).
Confirmation of Target Down-regulation in Liver Cells-Our dual approach to target identification revealed many proteins responsive to miR-122. To determine the importance of context and confirm that targets identified in HT1080 cells showed similar response in a liver cell line, we first analyzed changes in abundance for five proteins in the Huh-7 hepatocellular carcinoma cell line. Western blot analysis revealed all five identified targets to be significantly down-regulated in response to miR-122 transfection (Fig. 4).
Two recent studies identified changes in expression patterns for a subset of genes that were anti-correlated with miR-122 expression as revealed by microarray analysis of HCC patients (23,24). To further validate the biological relevance of targets identified in this study, and particularly in the context of HCC, we looked at whether our identified targets exhibited a strong anti-correlation with miR-122 as described in these recent studies (supplemental Table S3). Indeed, of the 41 targets empirically derived from the luciferase-based strategy that we mapped to the microarray dataset, 18 were negatively correlated with miR-122 expression in HCC tissues (Pearson correlation ϽϪ0.4, p value Ͻ0.0001). For targets identified by the proteomic approach, 27 of 68 mapped targets were negatively correlated with expression of miR-122 (Pearson correlation Ͻ Ϫ0.4, p value Ͻ0.0001), including 7 of the 10 targets crossvalidated by luciferase. In total, 38.4% of mapped targets showed strong negative correlation with miR-122, although only 2 of 99 showed significant positive correlation, suggesting that the observed changes in protein levels as determined by independent assays in HT1080 cells are indicative of functional miR-122 targets under biologically relevant conditions. miR-122-binding Site Analysis-To further validate our findings, we selected six miR-122 direct targets identified from our study for mutagenesis analysis. 2-3 bases were mutated in the seed recognition sequence of each 3ЈUTR reporter. In 5 of 6 cases, mutating the miR-122 seed recognition sequence resulted in significantly decreased repression by miR-122 (p value Ͻ0.05), measured by luminescence in the presence of the FIGURE 2. Proteomic analysis identifies down-regulated proteins with strong enrichment for miR-122 target site prediction. Predicted target sites were identified within 3ЈUTRs of proteins down-regulated in our proteomic analysis using various target prediction algorithms as follows: seed site complementarity of 7-mer-A1 and 7-mer-m8 motifs, 8-mer, and at either 7-or 8-mer (7-8-mer) seeds, as well as the TargetScan 5.1 with strict conservation of target sites (TS-Con). Results were plotted as the occurrence of a predicted target site versus the ranking of the proteins based on increasing down-regulation (A), and level of enrichment as the ratio of the frequency of predicted targets within each subset versus the total experimental dataset (B). miR-122 mimic (Fig. 5). To better understand this regulation, we determined the secondary structure of target mRNAs (wild type and mutant) in the presence of miR-122 (supplemental Fig.  S1 and Table S4). The miRNA accessibility in the seed complementary region of mRNAs appears to be very high for the six wild-type mRNAs, but it is decreased at least 40% for all mutated seed complementary sequences, as determined by the contribution of hybridization energy between the miRNA seed and mRNA seed complementary sequence. The data indicate that the thermodynamic stability of the local mRNA fold around the seed complementary sequence is below average (the stability score is greater than zero) and is increased for all mutated seed complementary sequences (stability score is decreased). These results also indicate that the seed complementary sequence is not involved in any local distinct RNA structure (within 50 nt), although neighboring regions do appear to have significant structures (supplemental Fig. S1). It remains unclear whether these adjacent secondary structures play any role in miRNA-mediated regulation. The two independently derived sets of experimentally determined targets allowed us to examine whether miR-122 target sites share any common features by identifying consensus binding motifs for the miRNA and mRNA alike. Given that the selection of luciferase targets involved computational predictions of target sites, which could bias the motif search, the initial motif analysis focused on the proteomics dataset alone. Motifs were calculated from target sites containing a 7-mer or greater miR-122 seed complementary sequence in the significantly down-regulated dataset. Each sequence position was scored probabilistically as the likelihood of involvement in target site binding (Fig. 6). The miR-122 5Ј-end corresponding to the seed sequence shows high probability as expected based on selection criteria. However, motif characteristics of the middle and   were selected for a mutagenesis study to partially validate our analysis and determine the importance of the seed sequence in miR-122 function. Two to three nucleotides were altered in the 3ЈUTR region of each gene matching the seed sequence and analyzed in luciferase assays. * designates a p value Ͻ 0.005; ** designates a p value Ͻ0.005. Mut, mutant; ALDOA, aldolase A. 3Ј-end were not affected by selection bias and thus reflect the general interaction features of the miRNA and target site. Interestingly, despite possible selection bias, the luciferase-binding motif was highly similar to the proteomics derived motif, with a calculated KL distance of 0.0021; the small KL distance quantifies the close similarity of the two binding motifs (data not shown).
The miR-122-binding motif contains two regions with high binding probability separated by a "bulge" region of four bases corresponding to nts 10 -13 exhibiting substantially decreased binding probability. This bulge region has been identified in other miRNAs and may contribute to proper miRNA-target interaction (5,44). The 3Ј-end of the miRNA shows a high frequency of binding, especially in nts 15-21. Previous studies have shown binding in this region to be common but not essential for other miRNAs (34,35), and our results confirm the significance of this region to provide stability to the overall interaction. Analysis of the mRNA-binding motif shows strong binding complementary to the miRNA seed sequence as expected, but it lacks a consistent binding motif to complement the miRNA 3Ј-end due to frequent mismatches, indicating a lack of consensus with regard to bulge length. Thus, although the miRNA-binding motif implicates extensive involvement of the 3Ј-binding region, the exact placement of where this binding occurs in the target strand varies greatly between target sites. Examples of this can be found in supplemental Fig. S1, d-f, where the target mRNA secondary structures contain stem loops of varying sizes within the bulge region, thereby greatly affecting the distance between the seed complementary sites and 3Ј complementary sites within the target strand.
Gene Ontology and Biological Association Analysis of Datasets-We expect miRNA-mediated regulation to display functional network characteristics similar to those observed for RNA-binding proteins (45,46). In the case of miR-122, we anticipated the identification of genes associated with liver metabolism, liver diseases, and HCC. To define the biological nature of genes obtained in our study, we performed gene ontology and biological association analyses. First, we looked for enrichment of specific biological processes in both the proteomics and the luciferase datasets. Pathway Studio 6 (Ariadne Genomics) was used to identify biological processes that were enriched for the down-regulated set obtained from the combined luciferase and proteomics approaches. The add neighbors algorithm was used to obtain cell processes downstream of the targets. The highly connected cell processes were compared with the background targets that were not affected by miR-122. We used the Fisher's exact test with p value Յ0.05 to select significantly enriched cell processes. Apoptosis, cell cycle, cell death, cell differentiation, cell growth, cell proliferation, and mitosis were the top processes determined to be significantly enriched in the down-regulated set in respect to background (Fig. 7). Further analyses identified 30 genes from the "direct target set" (luciferase and proteomics combined), which have multiple associations with liver diseases, including HCC, metabolism, and function ( Fig. 8 and supplemental Table S7). A larger number of connections were obtained when indirect targets (lacking a 7-mer seed site) were also included (supplemental Fig. S2 and Table S9). A more comprehensive list of genes related to diabetes and cancer can be found in Table 3 and  supplemental Table S5.
Although these associations are not conclusive, they are certainly consistent with a role for miR-122 regulation in HCC development. In further support of this role, a recent study comparing mRNA and miRNA profiles across tumor and nontumor tissue samples of HCC identified a network of mitochondrial genes responsive to miR-122 expression that becomes dysregulated upon down-regulation of miR-122 in HCC tumors, leading to loss of mitochondrial metabolic function (23). Interestingly, the authors (23) found that miR-122-responsive genes were strongly enriched for cell cycle-associated processes, a finding that is consistent with our results despite a limited overlap in genes identified between the two studies. These associations suggest a role for miR-122 in regulating not just liver metabolism but general liver function and tissue identity.
To identify gene functional networks that might be modulated by miR-122, we searched for biological connections between genes within the entire down-regulated dataset. Connections were established based on regulation of expression, function, or protein association described in the literature and retrieved by Pathway Studio 6. A network of 33 genes was identified ( Fig. 9 and supplemental Table S8) containing down-regulated genes with either a predicted miR-122 target site or a direct connection to the predicted target. The genes with highest connectivity are JUN, a well establish oncogene with defined roles in transcription regulation (reviewed in Ref. 47) and RAC1 (Ras-related C3 botulinum toxin substrate 1), a member of the RAS superfamily of small GTP-binding proteins that are implicated in the control of cell growth, cytoskeletal reorganization, and the activation of protein kinases (reviewed in Ref. 48). Neither JUN nor RAC1 contain predicted miR-122 target sites; however, both engage in functional interactions with several miR-122 targets identified in our screen. Another interesting gene, cAMP response-element binding protein (CREB1), was also shown to be strongly connected in our network, including connections to both RAC1 and JUN. CREB1 is a transcription factor involved in the regulation of a wide variety of cellular processes and is tied to oncogenesis (reviewed in Ref. 49).  MAY 20, 2011 • VOLUME 286 • NUMBER 20

JOURNAL OF BIOLOGICAL CHEMISTRY 18073
CREB1 does contain a predicted miR-122 target site, implicating it as a hub for the indirect regulation of numerous genes by miR-122.
Expanding upon our functional network, we identified several genes outside of our dataset with strong connectivity to our down-regulated gene set, including several that have been strongly implicated in apoptosis, cell growth, and proliferation and are key regulators of tumorigenesis (supplemental Fig. S3 and Tables S6 and S10). Most notably, p53, the extensively characterized tumor suppressor gene whose functions include cell cycle regulation and DNA repair (reviewed in Ref.

DISCUSSION
Our combined high throughput approach identified 260 genes repressed in response to miR-122, several of them key players in liver metabolism, disease, and cancer. Our data establish miR-122 as a regulatory node in a functional network of genes involved in liver metabolism and disease. Our results indicate that the number of proteins affected by miR-122 FIGURE 8. miR-122 and liver-related functions and diseases. The association between identified miR-122 targets and liver-related processes and diseases was investigated with Pathway Studio 6. Green indicates the genes identified in the proteomics analysis containing at least one miR-122 seed sequence; red indicates the genes identified by the luciferase analysis; blue indicates the genes identified by both methods. Liver function and disease associations are displayed in purple. ALDOA, aldolase A; Vim, vimentin; CS, citrate synthase; PXN, paxillin.

TABLE 3 Distribution of identified miR-122 targets according to biological function and diseases
The colors used are as follows: green, direct targets identified with proteomics analysis; pink, direct targets identified by luciferase assay; blue, high confidence targets identified by both methods. extends far beyond direct targets to include indirect but functionally related targets. To highlight the extensive information contained within these functional networks, we will discuss a few highly relevant genes identified in this study in the context of liver processes and disease.
miR-122 has an established role in hepatocarcinoma/hepatoma (13,15,55,56), functioning as a tumor suppressor gene, and is frequently down-regulated in tumor samples and HCC cell lines (reviewed in Ref. 10). Our data suggest that miR-122 controls a complex network of genes involved in cell cycle, proliferation, apoptosis, survival, and mutagenesis; therefore, miR-122 down-regulation could promote tumor formation in multiple ways. We will summarize the role of several direct targets we have identified in the context of tumorigenesis.
Tyrosine-protein phosphatase nonreceptor type 1 (PTPN1) is an enzyme that is the founding member of the protein-tyrosine phosphatase family. Protein-tyrosine phosphatases regulate numerous cellular processes, including cell growth, differentiation, cell cycle, and oncogenic transformation (reviewed in Ref. 57). PTPN1 (also known as PTP1B) has been explored as a potential target to control type 2 diabetes and obesity (58,59) and has been shown to regulate glucose homeostasis, body weight, and energy expenditure thanks to its function as a negative regulator of insulin-and leptin receptor-mediated signaling pathways (60,61). Additionally, PTPN1 has been suggested to function as an oncogene in the context of breast cancer. PTPN1 is up-regulated in HER2/Neu-transformed cells, and 90% of all breast tumor samples tested overexpress both PTPN1 and HER2/Neu (62)(63)(64).
A few miR-122 targets are associated with microtubules, including SEPT2 and SEPT9, members of the septin family. Septins were initially determined to be involved in cytokinesis and cell cycle control but more recently have been shown to have a role in microtubule-dependent processes, such as karyokinesis, exocytosis, and maintenance of cell shape (65). Both SEPT2 and SEPT9 were observed to be up-regulated in a variety of tumor types, including HCC (66). Another microtubule-associated target is vimentin (VIM), a member of the intermediate filament family. Intermediate filaments, along with microtubules and actin microfilaments, make up the cytoskeleton. Vimentin is implicated in the maintenance of cell shape, via stabilization of cytoskeletal interactions, and plays a role as an organizer of proteins implicated in attachment, migration, and cell signaling (reviewed in Ref. 67). In cancer, vimentin overexpression has been associated with HCC metastasis (68). Moreover, proteomic analysis indicated that circulating vimentin is higher in patients with small HCC than normal non-neoplastic controls, suggesting its use as a potential surrogate marker (37).
Another direct target identified in our dataset is the matrix metalloproteinase MMP7, which plays a role in the breakdown of extracellular matrix in normal physiological processes as well as in metastasis (reviewed in Ref. 69). MMP7 expression was shown to be anti-correlated with miR-122 in HCC patients, and levels of matrix metalloproteinase expression and the stage of tumor progression are frequently correlated. Moreover, it has been suggested that matrix metalloproteinases are also necessary for the creation and maintenance of a microenvironment that helps tumor growth and angiogenesis (70). MMP7 was determined to be up-regulated in cirrhotic nodules (potential precursors of HCC), compared with normal liver, as well as in HCC (71). Furthermore, elevated MMP7 expression is correlated with decreased survival and increased recurrence and liver metastasis of colon cancer (72) and pancreatic carcinoma (73). Most importantly, MMP7 was demonstrated to promote in vitro invasiveness of cancer cells of the stomach, colon, and pancreas (reviewed in Ref. 69).
Paxillin has also been implicated in metastasis. Paxillin is a multidomain protein that is primarily present in sites of cell adhesion to the extracellular matrix known as focal adhesions. Paxillin interacts with many proteins involved in the organization of the actin cytoskeleton, which are required for cell motility and implicated in a variety of biological processes, including tumor metastasis (74,75). Expression of paxillin protein in HCC may affect the invasive and metastatic ability of the tumor. In this regard, paxillin up-regulation was found to correlate with the presence of extrahepatic metastasis in hepatocellular carcinoma (76) and lymph node metastasis in breast tumors (77). Positive paxillin protein expression was associated with low differentiation, with the presence of portal vein thrombosis, and with extra-hepatic metastasis (76).
In addition to liver cancer, miR-122 has a direct role in the regulation of cholesterol and lipid metabolism (11,78). The expression of miR-122 is highly restricted to the liver, where it is believed to maintain the differentiated state (8,9). Silencing of miR-122 down-regulates genes implicated in cholesterol biosynthesis and triglyceride metabolism leading to reduction of total cholesterol levels (11,78), making miR-122 a viable candidate for therapeutic inhibitors to lower cholesterol in humans.
In this study, we have identified several miR-122-responsive genes involved in glucose homeostasis and the citrate cycle, the regulation of which can ultimately alter lipid metabolism. The liver synthesizes triacylglycerols from fatty acids when glucose levels are high and acetyl-CoA production exceeds the energy requirements of the cell (79 -82). Glucose provides the necessary substrates for triacylglycerol synthesis (acetyl-CoA for fatty acid synthesis and glycerol) using reactions in the glycolytic pathway and the citrate cycle (79 -82). Aldolase A, a direct target of miR-122, cleaves fructose 1,6-bisphosphate to generate dihydroxyacetone phosphate and glyceraldehyde 3-phosphate. Dihydroxyacetone phosphate can be used to make glycerol 3-phosphate, which can then be converted to triacylglycerol (79 -82). Glyceraldehyde 3-phosphate is processed into pyruvate through the glycolysis pathway, ending with the conversion of phosphoenolpyruvate to pyruvate with the help of pyruvate kinase. We identified the muscle isoform of pyruvate kinase (PKM2) as a direct miR-122 target, whereas the liver form, PKLR, was not repressed in our luciferase assay. Unlike PKM2, PKLR responds to regulation by epinephrine and glucagon, allowing the liver to shift toward gluconeogenesis in response to stimuli such as low blood glucose levels (83,84).
In order for pyruvate to enter the citrate cycle, it is first converted through oxidative decarboxylation to form acetyl-CoA. Citrate synthase, a direct target of miR-122, catalyzes the ratelimiting first step in the citrate cycle and the condensation of oxaloacetate and acetyl-CoA to form citrate (79 -82). Citrate can also be cleaved to re-generate acetyl-CoA and oxaloacetate by citrate lyase (ACLY), an indirect target of miR-122. Acetyl-CoA can also be converted to malonyl-CoA, the building block for fatty acid synthesis by fatty-acid synthase (79). Ultimately, fatty acids and glycerol are combined to form triacylglycerols that are packaged into VLDL particles in the liver and transported to the adipose tissue where they are stored in lipid droplets. It is worth noting that many of these genes are in pathways that are still functional, although highly regulated, in liver. Although miRNA regulation is often thought of as a method for shutting down protein expression, these examples demonstrate the more nuanced role of miRNAs as chemostats, allowing for modulated control overexpression and dampening of stochastic noise to stabilize protein levels (85)(86)(87).
We also identified the muscle isoform of glycogen synthase (GYS1) as a direct target of miR-122. Glycogen synthase catalyzes the rate-limiting step in glycogenesis. Although the liverspecific isoform GYS2 lacks miR-122 target sites, the GYS1 3ЈUTR contains three sites, suggesting miR-122 might play an important role in maintaining the tissue-specific expression of glycogen synthase isoforms by suppressing the non-liver form. This is the likely scenario for pyruvate kinase isoforms PKM2 and PKLR as well.
The aforementioned evidence shows the vital role of miR-122 in lipid metabolism. In its regulation of multiple genes, including PKM2, ALDOA, CS, and GYS1, miR-122 is shown to regulate glucose homeostasis and ultimately lipid metabolism. As a result, the previous effects observed in miR-122 antisense oligonucleotide-treated mice on lipoprotein metabolism may be due at least in part to an alteration on glucose homeostasis caused by miR-122 depletion.
As additional miRNAs are identified and investigated, the importance of their function in the regulation of expression grows more evident, especially as pertains to control of cell growth and maintenance of the differentiated state. The extent to which miRNAs have been implicated in tumorigenesis and disease progression across virtually all cancer types is indicative of this importance, yet our understanding of how miRNAs are involved remains quite limited. The development of high throughput screens such as ours will lead to a more comprehensive identification of the large numbers of biologically relevant miRNA targets. These identifications will in turn allow us to map out regulatory networks and reveal the molecular mechanisms through which miRNAs function, or in the case of disease, dysfunction. Our approach has generated the largest dataset of experimentally tested miR-122 targets currently available. In addition, our network analysis has led to the identification of many interactions through which miR-122 could affect the development and progression of hepatocellular carcinoma. By using the data produced through approaches such as ours, we expect to see the development of improved target prediction algorithms, the validation of a larger number of targets, and through validated targets, a greater understanding of miRNA function.