Sequential extracellular matrix-focused and baited-global cluster analysis of serial transcriptomic profiles identifies candidate modulators of renal tubulointerstitial fibrosis in murine adriamycin-induced nephropathy.

Transcriptome analysis using microarray technology represents a powerful unbiased approach for delineating pathogenic mechanisms in disease. Here molecular mechanisms of renal tubulointerstitial fibrosis (TIF) were probed by monitoring changes in the renal transcriptome in a glomerular disease-dependent model of TIF (adriamycin nephropathy) using Affymetrix (mu74av2) microarray coupled with sequential primary biological function-focused and secondary "baited"-global cluster analysis of gene expression profiles. Primary cluster analysis focused on mRNAs encoding matrix proteins and modulators of matrix turnover as classified by Onto-Compare and Gene Ontology and identified both molecules and pathways already implicated in the pathogenesis of TIF (e.g. transforming growth factor beta1-CTGF-fibronectin-1 pathway) and novel TIF-associated genes (e.g. SPARC and Matrilin-2). Specific gene expression patterns identified by primary extracellular matrix-focused cluster analysis were then used as bioinformatic bait in secondary global clustering, with which to search the renal transcriptome for novel modulators of TIF. Among the genes clustering with ECM proteins in the latter analysis were endoglin, clusterin, and gelsolin. In several notable cases (e.g. claudin-1 and meprin-1beta) the pattern of gene expression identified in adriamycin nephropathy in vivo was replicated during transdifferentiation of renal tubule epithelial cells to a fibroblast-like phenotype in vitro on exposure to transforming growth factor-beta and epidermal growth factor suggesting a role in fibrogenesis. The further exploration of these complex gene networks should shed light on the core molecular pathways that underpin TIF in renal disease.

Transcriptome analysis using microarray technology represents a powerful unbiased approach for delineating pathogenic mechanisms in disease. Here molecular mechanisms of renal tubulointerstitial fibrosis (TIF) were probed by monitoring changes in the renal transcriptome in a glomerular disease-dependent model of TIF (adriamycin nephropathy) using Affymetrix (mu74av2) microarray coupled with sequential primary biological function-focused and secondary "baited"-global cluster analysis of gene expression profiles. Primary cluster analysis focused on mRNAs encoding matrix proteins and modulators of matrix turnover as classified by Onto-Compare and Gene Ontology and identified both molecules and pathways already implicated in the pathogenesis of TIF (e.g. transforming growth factor ␤1-CTGF-fibronectin-1 pathway) and novel TIF-associated genes (e.g. SPARC and Matrilin-2). Specific gene expression patterns identified by primary extracellular matrix-focused cluster analysis were then used as bioinformatic bait in secondary global clustering, with which to search the renal transcriptome for novel modulators of TIF. Among the genes clustering with ECM proteins in the latter analysis were endoglin, clusterin, and gelsolin. In several notable cases (e.g. claudin-1 and meprin-1␤) the pattern of gene expression identified in adriamycin nephropathy in vivo was replicated during transdifferentiation of renal tubule epithelial cells to a fibroblast-like phenotype in vitro on exposure to transforming growth factor-␤ and epidermal growth factor suggesting a role in fibrogenesis. The further exploration of these complex gene networks should shed light on the core molecular pathways that underpin TIF in renal disease.
Focal segmental glomerulosclerosis (FSGS) 1 is a common cause of glomerulopathy and a significant cause of end-stage renal disease (1). As with other proteinuric glomerulopathies, subsequent progressive loss of renal function is associated with the development of tubulointerstitial fibrosis (TIF). Proposed stimuli for TIF in this context include glomerular-derived proteinuria, inflammatory mediators, cytokines and growth factors, and tubulointerstitial ischemia secondary to glomerular capillary injury and obsolescence (2)(3)(4)(5). The source of fibroblasts in glomerular disease-associated TIF is still being defined. Several complementary lines of evidence from in vitro and in vivo experiments implicate transdifferentiation of tubule epithelial cells to fibroblast-like cells as a central event in this setting (6,7).
Whereas several macrostimuli for TIF have been defined in FSGS, there remains a dearth of knowledge with regard to the molecular basis of the fibrotic process and, in particular, with regard to the molecular events that drive epithelial cell dysfunction during TIF. The emergence of microarray technology, which allows large scale profiling of gene expression in biological systems has greatly enhanced our capacity to probe the molecular pathogenesis of renal injury. In this present study, using microarrays, changes in the renal transcriptome were monitored in a murine model of FSGS, namely adriamycin nephropathy, during the transition of the tubulointerstitium from an normal to a fibrotic phenotype. Through sequential biological function-focused and "baited"-global cluster analysis, a cohort of candidate modulators of TIF were identified, several of which were also modulated during epithelial-fibroblast transdifferentiation in vitro.

MATERIALS AND METHODS
Murine Model of Adriamycin Nephrosis in Vivo-Six-week-old male BALB/c mice were kept under standard conditions and were randomly allocated to either control or experimental groups. Murine adriamycin nephropathy was established by a single injection of adriamycin at a dose of 9.5 g/g as previously reported (8). Age-matched controls were injected with the same volume of isotonic saline. Adriamycin-treated animals were sacrificed at days 3, 7, 14, and 28 following injection.
Microarray Analysis-RNA isolation, cDNA synthesis, in vitro transcription, and microarray analysis were preformed as previously reported (10). Briefly, total RNA was isolated from whole kidneys at baseline and 3, 7, 14, and 28 days post-injection of adriamycin and from murine proximal tubular cells in vitro using RNeasy Mini Column (Qiagen, Valencia, CA). cDNA was synthesized from total RNA using the Superscript Choice kit (Invitrogen). Biotin-labeled cRNA prepared from template cDNAs was fragmented and hybridized to the Affymetrix mu74av2 arrays as per the Affymetrix protocol (Affymetrix, Santa Clara, CA). Arrays were then washed and fluorescently labeled prior to scanning with a confocal scanner (Affymetrix). All in vivo time points were microarrayed in triplicate (i.e. experimental triplicates).
Image files were obtained through Affymetrix GeneChip software (MAS5). Subsequently robust multichip analysis (RMA) was performed (11,12): RMA is an R based technique that analyzes directly from the affymetrix microarray *.cel image file and is comprised of three steps: background adjustment, quantile normalization, and summarization. To use a Microsoft Windows operating system RMAexpress was used. This package is a stand-alone GUI program specifically designed to operate on Microsoft Windows and exclusively performs RMA (11,12).
As each in vivo time point was microarrayed in triplicate an average RMA value was computed to ensure that the average was statistically representative a t test and p values were generated. Only those genes with a p value of Յ0.01 were included in subsequent bioinformatic analysis (13,14). Thereafter, expression data for each time point was compared with control and a signal log ratio of 0.6 or greater (equivalent to a -fold change in expression of 1.5 or greater) was taken to identify significant differential regulation.
Using normalized RMA values, cluster analysis was performed by Eisen's program (15,16) of Unsupervised Average Linkage Hierarchical Cluster Analysis. Cluster analysis groups together genes with comparable patterns of expression by employing mathematical methods of similarity organized patterns of expression (15). Initially all the genes under study are assessed and the two closest genes are joined creating the first node. Subsequent nodes are determined and added by the pairwise joining of genes, based on the distance between them, culminating in all genes belonging to the one node (15). To ensure tight cluster analysis and reduce the background noise that accompanies microarray experiments the 12,488 genes of the Affymetrix mu74av2 chip were filtered to remove genes with very low expression at all time points (16,17). An initial cluster analysis of each individual array and the average array for each time point was performed to ensure that each time point was sufficiently similar in expression profiles to permit the use of average values thereafter.
A list of 200 extracellular matrix proteins or modulators of matrix turnover was curated via the publicly available Onto-Compare and Gene Ontology (GO) data bases (18,19). A complete list of these genes is available from the authors and is displayed on the Conway Institute website. 2

Global Changes in Gene Expression during Adriamycin-induced FSGS
Administration of adriamycin was associated with progressive glomerular injury followed by tubulointerstitial fibrosis (Fig. 1) as previously described (8). When individual and computed average microarrays were clustered there was strong alignment to their respective time point as illustrated in Fig. 2, panel A. Of the 12,488 gene sequences expressed on the Affymetrix mu74av2 oligonucleotide microarray 1.7 (211 genes), 1.8 (228 genes), 2.1 (368 genes), and 6.8% (852 genes) were perturbed at days 3, 7, 14, and 28 following adriamycin injection, respectively. Tables I and II highlight the genes whose mRNA levels changed most dramatically at days 3, 7, 14, and 28.  tional classes that correlate with the typical histopathological phenotype (e.g. inflammation, proliferation, apoptosis, and matrix proteins), striking changes were also noted in cohorts of developmental and metabolism genes not classically associated with nephropathy and TIF.
With the exception of inflammatory/immune genes, which peaked earlier, gene expression levels were most strikingly increased when moving between day 14 (269 mRNA transcripts significantly up-regulated) and day 28 (661 mRNA transcripts significantly up-regulated) (Fig. 2, panel B), a phase characterized by a shift in the disease from a predominantly glomerular phenotype with proteinuria and mild renal dysfunction to establish tubulointerstitial fibrosis and chronic renal failure ( Fig.  1). Whereas the temporal pattern was less striking with regard to down-regulated genes, maximal disturbance was again noted at day 28.

Primary Extracellular Matrix-focused Cluster Analysis
Mathematically "tight" clusters of genes following perturbation of biological systems can infer a commonality of function or regulation (15,20). Cluster analysis has also proven a very effective tool in molecular classification of certain neoplasms and in highlighting the heterogeneity in acute renal allograft rejection (16,21,22). In the present study the expression profile of 200 known ECM proteins and modulators of ECM turnover was first assessed with unsupervised hierarchical clustering, thereby defining the major patterns of expression within this functional class during adriamycin nephropathy ("Primary ECM-focused Cluster Analysis"). Genes within these major ECM clusters were then used as bait to probe the remainder of the transcriptome and identify other genes with similar patterns of expression ("Secondary Baited-Global Cluster Analysis").
For primary cluster analysis 200 genes that encode for either ECM proteins or modulators of ECM turnover were selected using Gene Ontology and Onto-Compare (18,19). Those genes with low expression at all time points were removed as previously described, resulting in average linkage unsupervised hierarchical analysis of 81 ECM genes and the identification of four prominent patterns of gene expression (Fig. 3). (Fig. 3, primary clusters A and B) were characterized by a striking increase in mRNA levels at day 28. The first (Cluster A) included induced TGF-␤, several members of the collagen superfamily (Col1␣1, Col1␣2, Col3␣1, Col4␣1, Col4␣2, Col8␣1, Col15␣1, and Col18␣1), matrix metalloproteinase 12 (MMP12), fibronectin-1, and decorin. Also of interest was SPARC (secreted protein acidic and rich in cysteine), which along with tenascin C and thrombospondin-1, are members of a family of secreted regulatory proteins that modulate interactions between cells and the ECM (23). They are expressed in vivo during development and during tissue remodeling following injury, consistent with the emerging paradigm highlighted again in this study (Fig. 2, panel C), that many developmental programs are recapitulated in the context of fibrosis and repair.

Identification of SPARC, Tenascin-C and Fibrillin-1 among Major Clusters of Up-regulated Genes in TIF-Two clusters
SPARC is found in a wide variety of cell types where it inhibits proliferation, focal adhesion, and prevents cell spreading in vitro (24). Consistent with the current study, increased SPARC mRNA expression has been shown in experimental models of Heyman nephritis, mesangioproliferative glomerulonephritis, and diabetic nephropathy (25)(26)(27). In vitro work by Francki et al. (28) demonstrated that SPARC regulated both TGF-␤1 and type I collagen production in mesangial cells, which is particularly interesting given that SPARC clustered alongside type I collagens (Col1␣1 and Col1␣2) and TGF-␤1 (Primary Cluster A) in this in vivo model of TIF (28). Together this data supports the suggestion that SPARC promotes ECM production and deposition by regulating type I collagen expression through TGF-␤1-dependent pathways.
The second cluster of genes up-regulated at day 28 (Fig. 3, Primary Cluster B) included TGF-␤1, connective tissue growth factor (CTGF), the matrix protein type VI collagen (Col6␣1 and Col6␣3), tenascin C, and fibulin-1. Consistent with the hypothesis that cluster analysis can identify genes whose regulation or function is linked, CTGF is induced by TGF-␤1 and promotes ECM production by fibroblasts and indeed resident renal cells (29).
Increased tenascin C expression is noteworthy as overexpression of tenascin C is associated with disruption of cell adhesion and migration in non-renal systems (21,30). In keeping with an emerging role for tenascin C in renal disease, tenascin C expression is limited to the medullary interstitium in normal kidney, whereas marked tenascin C expression has been reported in the context of human tubulointerstitial inflammation or fibrosis in a variety of pathological conditions (31). As with TGF-␤ the clustering of tenascin C with CTGF was remarkable given that CTGF has been shown to induce tenascin C in human tubule epithelial cells in vitro again supporting a role for tenascin C in TIF (32).
The finding of fibulin-1 in this cluster was also intriguing given that this fibrillar ECM protein binds to a host of basement membrane proteins including fibronectin, fibrillin, laminins, and collagens, several of which were also represented in this cluster (33). Fibulin-1 interacts with fibronectin in vitro to inhibit cell motility of mesenchymal cells, suggesting a role in interstitial fibrosis derived from EMT (see below) (34).
Identification of Matrilin-2 among Major Clusters of Downregulated Genes in TIF-The two major primary clusters of down-regulated genes were characterized temporally by an early (Fig. 3, Primary Cluster C) or late (Fig. 3, Primary Cluster D) decline in expression as the disease evolved. Both clusters included type IV collagens (Col4␣3 and Col4␣4), members of the laminin family (LAMA3 and LAMC3), matrix T-cell specific GTPase (TGTP) 3.2 AA790307 Placenta-specific 8 (PLAC8) 3.2 X66449 S100 calcium-binding protein A6 (calcyclin) (S100A6) 3. Matrilin-2 (MATN2) is the largest member of the matrilin family of extracellular proteins, which have well established roles in the development and homeostasis of cartilage and bone. Unlike other family members whose expression is limited to skeletal tissues, matrilin-2 is found in a wide variety of tissues including normal human kidney (35,36). Whereas the function of matrilin-2 is poorly defined it can interact with itself and a variety of matrix molecules including type 1 collagen, fibronectin, and laminin and has been postulated to act as an adaptor molecule connecting these molecules with other proteins and proteoglycans within the ECM where it may regulate ECM homeostasis (37,38).
From the foregoing results, it will be apparent that cluster analysis of large ECM-associated cohorts of genes can both identify groups of genes with known functional or regulatory associations and suggest novel molecular relationships in ECM homeostasis in TIF. With regard to known fibrotic pathways, the prominence of the TGF-␤-CTGF-Collagen axis (Fig. 4) was particularly noteworthy and added to the expanding body of evidence implicating CTGF as an important mediator of renal fibrosis.

Secondary Baited-global Cluster Analysis
The gene expression patterns identified by primary ECMfocused cluster analysis were used as bioinformatic bait with which to probe the renal transcriptome for other novel modulators of ECM turnover in TIF. To this end, average linkage unsupervised hierarchical analysis was performed on genes of the Affymetrix mu74av2 microarray with an RMA signal log ratio of ϽϪ0.5 and/or Ͼϩ0.5 at one or more time points so as to reduce confounding noise (15)(16)(17). 1372 mRNA transcripts underwent secondary baited-global cluster analysis and all ECM genes maintained the same pattern of expression as in primary level cluster analysis; however, now the cluster tree was larger incorporating other genes with comparable patterns of expression (Figs. 5-8).
Identification of Clusterin, Endoglin, and Gelsolin among Major Clusters of Up-regulated Genes-As with primary level cluster analysis the dominant pattern of expression in secondary level cluster analysis was characterized by maximal upregulation at day 28 (Fig. 5). Among the genes aligning to collagens and fibronectin was clusterin (CLU) (Fig. 5, Secondary cluster A3), a glycoprotein that is expressed by many tissues and is implicated in diverse cellular responses (39). Whereas increased clusterin mRNA and protein levels have been observed in several non-renal diseases including neurodegenerative and neoplastic disorders, increased clusterin ex-pression along with fibronectin has been reported in renal perfusion-ischemia in the rat (39 -41). Interestingly, Aoyama et al. (42) noted that the oral adsorbent, AST-120 alleviates pro-

FIG. 3. Unsupervised hierarchical cluster analysis of 81 genes encoding ECM proteins or modulators of ECM turnover (primary level cluster analysis).
Four major patterns of ECM gene expression were identified: panels A and B demonstrate genes whose expression was maximal at day 28. In the case of panel B these genes demonstrated a biphasic response; early in the disease when there is little clinical or light microscopy evidence of injury these genes were down-regulated but as the disease progressed and renal fibrosis occurs these were maximally expressed. This contrasts with panel C, where the maximal gene expression at control relative to other time points, in other words gene expression, was either unaffected or down-regulated as the disease progressed. Finally, panel D demonstrates genes that were down-regulated as the disease evolved.  6. Genes whose expression is initially depressed but then later demonstrate maximal expression at day 28 include biglycan and SPARC who are known ECM proteins but within this tight cluster (secondary cluster B1) is now endoglin, a component of the TGF-␤ receptor system. Likewise in secondary cluster B3 TGF-␤1 is aligned to members of the tubulin ␣ superfamily (tubulin ␣ 1, 2, and 6), which are the basic building blocks of microtubules.
gressive renal failure and TIF in a rat model of type 2 diabetic nephropathy, a protective effect that was associated with reduced renal expression of intercellular adhesion molecule 1, TGF-␤1, and clusterin. Clusterin exposure is induced in non-renal in vitro systems by oxidative, thermal, and apoptotic stimuli suggesting a role in cellular stress or death responses (43).
Primary ECM-focused cluster analysis revealed a cluster of ECM genes with a biphasic response (Fig. 3, Primary cluster  B); an initial down-regulation being followed by up-regulation with maximal expression at day 28. This cluster includes CTGF, biglycan, and transgelin. With secondary baited-global clustering these genes now align alongside endoglin (ENG), thrombomodulin (THBD), gelsolin (GSN), claudin 4 (CLDN4), and TGF-␤ inducible early growth response-1 (TIEG1); the latter being a member of the TGF-␤ superfamily (Fig. 6). ENG is a membrane glycoprotein, originally identified as being expressed in human vascular endothelial cells (44). Defects in the endoglin gene result in hereditary hemhorragic telangiectasia type-1 (45). Intriguingly, it is thought to be a component of the TGF-␤ receptor complex as it binds both TGF-␤1 and TGF-␤3 receptors with high affinity in human endothelial cells and in vitro work has shown that overexpression of ENG inhibits TGF-␤1 activity thereby modulating several cellular responses to TGF-␤1 including fibronectin and plasminogen activator inhibitor type-1 (46,47). ENG expression is increased in TIF secondary to both unilateral ureteric obstruction and renal mass reduction in vivo (48,49). Whereas the mechanism for this up-regulation remains unclear, TGF-␤ induces ENG expression in mesangial cells in vitro raising the possibility that ENG may be a TGF-␤ response gene in other renal compartments also (50).
GSN is a calcium-regulated actin-binding protein, expressed in many tissues, that is a major actin-severing molecule (51,52). Whereas not previously linked to renal fibrosis, GSN has been implicated in renal oncogenesis and mutations in this gene result in hereditary amyloidosis, Finnish type (53)(54)(55). In the context of this study it is noteworthy that GSN also binds to fibronectin and early studies suggested that this interaction might serve to localize plasma gelsolin in regions where fibronectin is deposited, such as at inflammatory sites. It was striking that GSN clustered with CTGF (Fig. 6, Secondary cluster B2) in this mouse study as CTGF induces actin disassembly and motility of a variety of cell types including renal cells in vitro (3,29).
Identification of Activating Transcription Factor (ATF) 3, Transmembrane-7 Superfamily Member-1 (TM7SF1), and Meprin-1␤ among Major Clusters of Down-regulated Genes-As discussed earlier the two major patterns of down-regulated ECM genes identified by primary level cluster analysis were either an early or late decline in expression as TIF evolved. In secondary baited-global cluster analysis Fig. 7 shows clusters of genes built around the ECM genes laminin-␣3 and lysl oxidase-like characterized by a dramatic fall in expression at day 3 following adriamycin injection but subsequently a gradual increase in expression toward baseline. Interestingly, many of the genes that aligned with these molecules by secondary analysis exhibited marked similarity of function (Fig. 7, secondary cluster C1 and C2) and were either involved in stress response (heat shock proteins) or were modulators of transcription (ATF3, early growth response 2 (EGR2), DNA-damage inducible transcript 3 (DDIT), and RE1-silencing transcription factor (18,19).
ATF3, a member of the ATF/cAMP responsive element-binding protein family of transcription factors, encodes a protein that represses rather than activates transcription. It is induced by stress signals in a variety of differing cells and tissues including reperfusion in renal ischemic-reperfusion injury in vitro (56,57). In the context of this study, it is noteworthy that ATF3 has a similar pattern of expression to the mRNA transcript DDIT3, which encodes a nuclear protein that is a dominant negative regulator of the transcription factors C/EBP, liver activator protein, and ATF3 (58,59). Experimentally DDIT3 has been shown to be induced by neuronal hypoxia but although expressed in normal kidney there is little known about the role of DDIT3 in renal disease (60,61). Fig. 8 shows two clusters built around superoxide dismutase 3 and type IV collagen, ␣4 (Col4␣4) is characterized by a transient mild increase in expression early (day 3) but a subsequent significant decline in expression from day 14 to 28. Genes within the superoxide dismutase 3-associated cluster include TM7SF1, thiopurine methyltransferase, and meprin-1␤ (MEP1␤), the latter of which belongs to a family of multidomain zinc metalloproteases that are highly expressed in mammalian kidney, intestinal brush-border membranes, leukocytes, and certain cancer cells (62). MEP1␤ will be discussed in detail later. TM7SF1 (Fig. 8, secondary cluster D1) is a member of the G-protein-coupled receptor superfamily. Although little is known about the regulation and function of this gene, it has been shown to be strongly up-regulated in kidney development (63). Intriguingly TM7SF1 expression is down-regulated in some Wilms tumors (63).

Expression of Genes Identified in Adriamycin
Nephropathy in Vivo during EMT in Vitro EMT of tubule epithelial cells toward a fibroblast-like phenotype in response to TGF-␤ and other mediators appears to be  an important source of fibroblasts in TIF (6,7). In the present study we next assessed the expression levels of several genes identified as being perturbed in TIF in vivo during EMT in vitro as an initial approach to the assessment of their role in epithelial cell dysfunction in TIF. Among genes undergoing similar patterns of response in vivo and in vitro were induced TGF-␤, tenascin C, MMP13, and decorin (Tables III and IV). Global secondary level baited cluster analysis in turn identified some intriguing molecules including claudin-1, which was induced, and Mep1␤, which was down-regulated. As mentioned earlier MEP1␤ is a zinc endopeptidase that is expressed in kidney proximal tubular cells and cleaves cytokines including monocyte chemotactic peptide-1 as well as hydrolyzing fibronectin, type IV collagen, and other basement membrane components in vitro (64). In agreement with the results of the present study, several other lines of evidence have linked decreased MEP1␤ expression with renal injury. We recently reported a fall in MEP1␤ mRNA levels in acute renal ischemic reperfusion injury (10). As in this study there is substantial experimental evidence showing decreased MEP1␤ expression with tubular injury: Ricardo et al. (65) also noted an early and progressive decline in MEP1␤ expression in a rat model of UUO.
Claudins are a large family of tight junction integral membrane proteins, which play critical roles in cell-to-cell contact and barrier function in epithelial and endothelial cells in both renal and non-renal tissues. Claudin proteins, including claudin-1 have been shown in vitro to recruit many membrane-type matrix metalloproteinases including pro-matrix metalloproteinase-2 thus enhancing activation of pro-MMP-2 (66). We recently reported induction of claudin-1 mRNA levels following acute renal ischemic reperfusion injury (10). The results of the present study support a role for claudins in the pathogenesis of chronic renal disease.
In summary, sequential ECM-focused primary cluster analysis followed by baited-global secondary cluster analysis identified both established and novel gene networks that are potential drivers of TIF in vivo. This analysis provides the transcriptomic framework upon which to base further pathway and molecule-specific interrogations of the complex web of molecular events that subserve TIF. Such further application of this analytical approach to the study of renal fibrosis and other pathophysiological processes promises to shed new light on fundamental mechanisms of disease.