Identification of Novel, Functional Genetic Variants in the Human Matrix Metalloproteinase-2 Gene

Matrix metalloproteinase-2 (MMP-2) is an enzyme with proteolytic activity against matrix and nonmatrix proteins, particularly basement membrane constituents. Thus, any naturally occurring genetic variants that directly affect gene expression and/or protein function would be expected to impact on progression of pathological processes involving tissue remodeling. We scanned a 2-kilobase pair promoter region and all 13 exons of the humanMMP-2 gene, from a panel of 32 individuals, and we identified the position, nature, and relative allele frequencies of 15 variant loci as follows: 6 in the promoter, 1 in the 5′-untranslated region, 6 in the coding region, 1 in intronic sequence, and 1 in the 3′-untranslated region. The majority of coding region polymorphisms resulted in synonymous substitutions, whereas three promoter variants (at −1306, −790, and +220) mapped onto cis-acting elements. We functionally characterized all promoter variants by transient transfection experiments with 293, RAW264.7, and A10 cells. The common C → T transition at −1306 (allele frequency 0.26), which disrupts an Sp1-type promoter site (CCACC box), displayed a strikingly lower promoter activity with the T allele. Electrophoretic mobility shift assays confirmed that these differences in allelic expression were attributable to abolition of Sp1 binding. These data suggest that this common functional genetic variant influencesMMP-2 gene transcription in an allele-specific manner and is therefore an important candidate to test for association in a wide spectrum of pathologies for which a role for MMP-2 is implicated, including atherogenesis and tumor invasion and metastasis.

The matrix metalloproteinases (MMPs) 1 constitute a family of secreted and membrane-associated zinc-dependent endopeptidases that are capable of selectively degrading a wide spectrum of both extracellular matrix and nonmatrix proteins (1). Currently upwards of 20 vertebrate MMPs have been reported that can be categorized, by substrate specificity, to give the collagenases, stromelysins, gelatinases, and membrane-type MMPs. The broad range of substrates conveys a pivotal role for MMP involvement during both normal physiological processes (e.g. embryonic development, bone remodeling, angiogenesis, nerve growth, etc.) and pathological states (e.g. arthritis, cancer, atherosclerosis, liver fibrosis, etc.) (2). Accordingly, MMP activity is tightly coordinated at several levels including transcriptional regulation, activation of latent zymogen, and interaction with endogenous inhibitors (3).
MMP-2 (gelatinase A) has type IV collagenolytic activity and is constitutively expressed by most connective tissue cells including endothelial cells, osteoblasts, fibroblasts, and myoblasts. The membrane-bound activation of pro-MMP-2 ensures that proteolytic activity, predominantly against components of the basement membrane, is localized to discrete regions on the cell surface (4 -6) thereby potentiating extracellular matrix remodeling as well as uniquely generating several different biologically active molecules including laminin, fibronectin, and monocyte chemoattractant protein-3 (7)(8)(9)(10). Indeed, the majority of MMP-2 studies have focused on demonstrating an essential role in promoting cell invasiveness during tumor angiogenesis, arthritis, and atherogenesis (11)(12)(13)(14)(15), as well as tumor metastasis where levels of MMP-2 expression can be correlated with tumor grade (16,17). Not surprisingly, the design of specific and selective inhibitors of MMP-2, for therapeutic intervention, remains an intense focus of research (18).
The diseases in which a role for MMP-2 has been demonstrated are characterized by varying individual susceptibility, implying the role of genetic factors. Traditional linkage analysis methods for mapping the genes of Mendelian disorders have not been as successful in the studies of complex genetic diseases including coronary heart disease, cancers, and arthritic disorders. Attention has therefore focused on the rapid elucidation of a new class of genetic markers termed single-nucleotide polymorphisms (SNPs). These are the most common types of stable genetic variants estimated to occur, on average, every 1,000 bp and are therefore valuable markers in tests of association for susceptibility, or resistance, to common and genetically complex diseases (19 -21) and pharmacogenetic traits (22). Indeed, the emergence of the common disease-common variant hypothesis (21,23) has provided important examples of such associations, including the APOE-4 allele in Alzheimer's disease (24) and the CCR5 allele in HIV resistance (25). How-* This work was supported in part by British Heart Foundation Grant RG/1995008. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
The nucleotide sequence(s) reported in this paper has been submitted to the GenBank TM /EBI Data Bank with accession number(s) AJ298926.
§ Recipient of a British Heart Foundation Studentship Grant FS/97051. ʈ British Heart Foundation Basic Science Lecturer and supported by Grant BS/99003. ** To whom correspondence should be addressed: Dept. of Cardiovascular Medicine, John Radcliffe Hospital, Oxford OX3 9DU, UK. Tel.: 44-0-1865-220257; Fax: 44-0-1865-768844; E-mail: hugh.watkins@cardiov.ox.ac.uk. 1 The abbreviations used are: MMPs, matrix metalloproteinases; SNPs, single-nucleotide polymorphisms; PCR, polymerase chain reaction; DHPLC, denaturing high performance liquid chromatography; 5Ј-UTR, 5Ј-untranslated region; 3Ј-UTR, 3Ј-untranslated region; Sp1, stimulating protein 1; CDE, cell cycle-dependent element; kb, kilobase pair; bp, base pair; nt, nucleotide; EMSAs, electrophoretic mobility shift assays. ever, the power of association-based studies can be compromised by multiple hypothesis testing that is exacerbated by publication bias (26). This problem can be overcome, in part, by distinguishing functional SNPs from neutral counterparts across a candidate region thereby generating a panel of robust, informative variants that are more likely to have an influence in disease progression.
The specific predilection of MMP-2 for substrates important in basement membrane integrity makes it a strong candidate for a number of heritable traits including, for example, atherosclerosis (by allowing immigration of smooth muscle cells and by facilitating plaque rupture). Common variants that alter the amount of protein expressed, for instance by affecting transcriptional regulation, or that subtly alter the activity of the protein itself would be expected to have a quantitative influence on disease activity. We have therefore identified the nature and extent of genetic variation in the promoter (ϳ2 kb) and complete coding region of the human matrix metalloproteinase-2 gene. We describe the in vitro characterization of the entire panel of promoter polymorphisms, and we identify one particular functional variant that alters MMP-2 promoter activity through allele-specific binding of the transcription factor Sp1.

Isolation of the Human MMP-2 Promoter-The Human Genome
Walking kit (CLONTECH) was used to obtain additional MMP-2 5Јflanking sequence to that previously published (27). A primer complementary to the sequence ϩ51/ϩ72 relative to the major transcription start site (28) was used in combination with an adaptor primer in primary PCRs using five human genomic libraries as templates. A secondary PCR, using a nested MMP-2-specific primer (Ϫ13/ϩ10 relative to the transcription start site), was performed, and products from the DraI, SspI, and ScaI libraries were gel-purified and cloned into pBluescript II SK(Ϫ) (Stratagene). Sequence extending 1900 bp upstream of the transcription start site was obtained by sequencing several independent clones on an ABI 377 automated sequencing apparatus with cycle sequencing according to the manufacturer's instructions (PerkinElmer Life Sciences).
Primer Design for Scanning the MMP-2 Gene-Eleven overlapping PCR amplicons were designed to analyze the promoter sequence (1900 bp), whereas 25 PCR fragments were generated to scan the 5Ј-UTR, coding region, and 3Ј-UTR (see Table I). To address the extent of genetic variation at splice site junctions, scanning primers were designed based on sequence information obtained by amplifying between exons, using oligonucleotides based on the genomic organization (28).
Scanning the MMP-2 Gene-Denaturing high performance liquid chromatography (DHPLC) was used to scan the MMP-2 gene for sequence variation, using the WAVE DNA Fragment Analysis System (Transgenomic), as previously described (29). Optimal PCR conditions for faithful amplification were derived for each amplicon; reactions were performed in 50-l volumes containing 20 ng of genomic DNA, 50 mM potassium chloride, 25-pmol primers, 200 M dNTPs, 1 mM MgCl 2 , and 2.5 units of PfuTurbo polymerase (Stratagene). Genomic DNA was amplified from a panel of 32 unrelated healthy Caucasian subjects. PCR products (45 l) were denatured at 95°C for 4 min and allowed to reanneal to form homo-and heteroduplexes by cooling to 25°C over a 50-min period. 10 l of each PCR product was applied to the WAVE TM machine using varying column temperatures (57-69°C), as predetermined by the Transgenomic Analysis Software (according to the predicted melting characteristics of each amplicon). Individual PCR products, which displayed heteroduplex signaling patterns, were purified with QIAquick purification columns (Qiagen), and both strands were sequenced on an ABI 377.
Cell Lines and Transient Transfections-The RAW264.7, A10, and 293 cell lines were obtained from the Sir William Dunn School of Pathology Cell Bank, and reagents were purchased from Life Technologies, Inc. Cells were grown in RPMI 1640 (RAW264.7) or Dulbecco's modified Eagle's medium (A10 and 293) supplemented with 10% (v/v) heat-inactivated fetal calf serum, 2 mM glutamine, and antibiotics at 37°C and 5% CO 2 in a humidified incubator. For transient transfection experiments, 5 ϫ 10 4 cells were plated in 10-mm 24-multiwell plates and grown to 60 -70% confluence. Transfection was carried out using FuGENE 6 Transfection Reagent (Roche Molecular Biochemicals) according to the manufacturer's protocol. Cells were cotransfected with 0.5 g of reporter plasmid and 0.1 g of pcDNA3-␤-galactosidase expression vector (30) to standardize for transfection efficiency. Cells were incubated for 24 h, washed twice in phosphate-buffered saline, and harvested by the addition of 100 l of lysis buffer (Luciferase Assay System, Promega). Luciferase levels were quantified using a Luminoskan Ascent Luminometer (Labsystems), and ␤-galactosidase activities were measured using a commercially available ELISA kit (Roche Molecular Biochemicals). All experiments were carried out in triplicate and independently performed at least three times. Results were expressed as a ratio of luciferase activity to ␤-galactosidase activity, and statistical levels of significance, for comparison between transfections, were determined by the Student's t test. Mammalian expression vectors containing wild-type and mutant murine GATA-1 were a kind gift from Dr. Sjaak Philipsen (Erasmus University, Rotterdam, The Netherlands).

Isolation of the Human MMP-2
Promoter-To generate additional sequence data for variant analysis, an extended promoter region of the human MMP-2 gene was isolated. Nested MMP-2-specific primers were used with adaptor primers to amplify the promoter from five adaptor-ligated human genomic libraries. The DraI, SspI, and ScaI libraries yielded products of 1.7, 4.1, and 6.0 kb, respectively, that were cloned and analyzed by restriction enzyme mapping. The DNA sequence extending 1900 bp upstream of the major transcription start site (28) was determined, and 11 overlapping amplicons were designed for DHPLC analysis.
Determination of Flanking Intronic Sequence-Additional intronic sequence was generated to that available (28) by sequencing introns, isolated as PCR products, and the information obtained was used to design appropriate primers for DHPLC analysis to evaluate the extent of genetic variation at MMP-2 splice-site junctions.
Identification of Novel Genetic Variants in the MMP-2 Gene-Thirty four different PCR primer pairs, encompassing the promoter sequence (1900 bp), 5Ј-UTR, complete coding region, and 3Ј-UTR, were designed to assess the nature and extent of nucleotide variation in the human MMP-2 gene (see Table I). Amplicons were generated from a panel of 32 unrelated healthy Caucasian individuals. This number was chosen to provide Ͼ90% power to detect polymorphisms with an allele frequency of Ͼ5% (21), on the basis that common variants were the principal target of this screen. PCR products were applied to the DHPLC column and subjected to partial heat denaturation, over a 6.8-min interval, to produce sequence-specific chromatograms. Elution profiles were compared with each other, with double or multiple peak patterns indicating the presence of polymorphic site(s) (Fig. 1). The nature and location of variants were determined by dye-terminator sequencing.
By using this approach, a total of 15 novel sequence variants were identified in the MMP-2 gene. All variants were single base substitutions, comprising seven transversions and eight transitions distributed throughout the gene with six variants in the promoter, one in the 5Ј-UTR, six in the coding region, one in intervening sequence, and one in the 3Ј-UTR (Fig. 2).
Analysis of Variants in the MMP-2 Coding Region, Introns, and 3Ј-UTR-The majority of coding region variants ( Fig. 2A) results in synonymous substitutions; however, the G 3 A transition at cDNA position 1646 causes a nonconservative amino acid change from glycine to serine (G456S). This amino acid is situated in the hinge region of the protein, which links the catalytic domain to the hemopexin-like domain, and is believed to be important in the targeting of substrates (32). Clustal alignment revealed that Gly-456 is conserved across species (chicken, rabbit, rat, and mouse) demonstrating the importance of this small neutral residue and signifying the potential impact that a substitution to a larger hydrophilic amino acid may have upon enzymatic activity/substrate specificity. We are currently undertaking in vitro studies to address possible functional effects of this variant.
We identified one intronic sequence variant in intron 5, located 11 bp downstream of the intron start site. This does not map to known splice-site junction consensus sequences and is therefore unlikely to affect mRNA splicing patterns. We also identified a common A 3 C transversion in the 3Ј-UTR of the gene (nt 2523). Sequence analysis established that this variant does not lie within, or in close proximity to, adenylate/uridylate-rich elements that are known to be bound by families of proteins implicated in the regulation of mRNA stability (33).
Analysis of Variants in the MMP-2 Promoter and 5Ј-UTR-Six variants were identified in the promoter region (four transversions and two transitions) that were evenly distributed across a ϳ2-kb interval with the exception of two T 3 G transversions at Ϫ790 and Ϫ787 (Fig. 2B). Sequence analysis demonstrated an error in the published sequence (27) with the AG dinucleotide reported at position Ϫ1569 actually being the G 3 A polymorphism at Ϫ1575 (allowing for the different transcription start sites mapped in these studies). To determine whether variants created or abolished potential cis-acting elements, we used the TRANSFAC data base (34) to identify polymorphisms that might have an effect on transcription through altering the binding of transcription factors (see Table  II). Interestingly, three variants mapped onto regions displaying 100% homology to previously reported consensus sequences as follows: stimulating protein 1 (Sp1) at Ϫ1306 (35), an inverted GATA-1 site at Ϫ790 (36), and a cell cycle-dependent element (CDE) in the 5Ј-UTR, at ϩ220 (37). We therefore considered the ϩ220 variant as being potentially integral to the regulatory mechanisms imposed by the MMP-2 promoter and included it in our subsequent analyses.

Generation of Reporter Gene Constructs to Measure Differences in Allelic Expression between MMP-2 Promoter
Variants-To distinguish functional variants from nonfunctional neutral counterparts, we generated a panel of reporter gene constructs that were used to measure differences in allelic expression, in the context of the regulatory region in which the variant was located. Specifically, four reporter gene constructs were made per variant, in which three concatenated copies of the 24-bp nucleotide region flanking a variant were cloned immediately upstream, and in both orientations, of the human EIF-4AI minimal promoter (30) driving the expression of a luciferase reporter gene (see Fig. 3). The resultant constructs were used to transfect transiently several different cell lines including epithelial cells (293), macrophages (RAW264.7), and smooth muscle cells (A10), using a pcDNA3-␤-galactosidase expression vector to standardize for transfection efficiency. Data were presented as the fold increase in allelic expression relative to empty vector (pEIF-4AI) (see Table III).
Prioritization of those variants most likely to be functional was based on previous studies that proposed a minimum threshold of a 2-fold difference in allelic expression as being a plausible indicator of functionality (38,39). Based on these guidelines the Ϫ1575G/A and Ϫ955C/A variants were classified as neutral as they consistently produced similar levels of luciferase activity for both alleles in all cell lines studied; the Ϫ168G/T polymorphism produced allelic differences in some cell lines, but these were below the 2-fold threshold. Thus, these transfection data correlated with our prior data base analysis confirming that these variants do not map to known cis-acting elements (Table II). Additionally the ϩ220G/C variant, mapped to a CDE consensus sequence (CGCGG), exhibited varying luciferase activities that were beneath the 2-fold threshold for differences in allelic expression.
In contrast, the Sp1 consensus sequence (CCACC) spanning the Ϫ1306 polymorphic site displayed allele-specific transcriptional effects as Ϫ1306C transfectants expressed at least 2-fold higher luciferase activity than cells transfected with Ϫ1306T constructs: 1.71 Ϯ 0.25 versus 0.57 Ϯ 0.09, p Ͻ 0.05 (Ϫ1306C and Ϫ1306T, respectively, RAW264.7). This effect was observed in several other cell lines (data not shown) and collectively provided strong supportive data correlating reduced expression from the T allele with the abolition of the Sp1 consensus site. A similar association between Ϫ790 allelic expression and an inverted GATA-1 element (CTATCT) could not, however, be inferred as comparable luciferase levels were measured between these constructs as follows: 6.12 Ϯ 0.22 versus 6.85 Ϯ 0.09, p Ͻ 0.05 (Ϫ790T/Ϫ787T and 790G/Ϫ787T, A10). Indeed, similar results were attained when cells were cotransfected with a GATA-1 expression vector (data not shown).
Curiously the Ϫ787G allele, downstream of the GATA-1 site, did appear to regulate transcription in a cell type-specific manner. For example, Ϫ790T/Ϫ787G transfectants in 293 cells showed a 4.3-fold reduction in promoter activity compared with  Nucleotide variation is detected by high resolution separation of heteroduplexes, which form in PCR samples having internal sequence variation, from homoduplex counterparts. For example, individual A displays a single peak of homoduplex DNA, whereas individuals B and C exhibit multiple peaks caused by heteroduplex generation during PCR amplification. Sequence analysis identifies the nature and location of variants, individual B (G 3 A, nt 1660), individual C (G 3 A, nt 1646), thereby illustrating the specificity and sensitivity of this technique. The x axis displays column retention time in minutes; the y axis is a measure of absorbance (converted to microvolts); the common deflection peak at 0.5 min is due to residual dNTPs and unincorporated primers from the PCR. tional neutral SNPs, whereas the Ϫ1306 C 3 T transition may influence MMP-2 promoter activity in an allele-specific manner.
Allele-specific Binding of Nuclear Protein at the Ϫ1306 Polymorphic Site of the MMP-2 Promoter-We performed EMSAs to investigate whether differences in allelic expression between the Ϫ1306C and Ϫ1306T allele were attributable to the differential binding of nuclear protein(s). In these assays, two oligonucleotide probes corresponding to the sequence from Ϫ1317 to Ϫ1294 in the MMP-2 promoter, with either a T or C at the Ϫ1306 polymorphic site (Fig. 4A), were 32 P-labeled and allowed to interact with crude nuclear extracts prepared from different cell lines including smooth muscle cells (A10), epithelial cells (293), and monocytic leukemia cells (U937) (Fig. 4, B-D, respectively). Two DNA-protein complexes (designated I and II) were consistently detected with the Ϫ1306C probe, but not the Ϫ1306T probe, in these assays, irrespective of cell type. To determine the sequence specificity of these DNA-protein complexes, competition experiments were performed. Both bands were competed with 50-and 100-fold excess of unlabeled Ϫ1306C probe (lanes 10 and 11) but not by 50-or 100-fold excess of unlabeled Ϫ1306T probe (lanes 12 and 13). Furthermore, the specificity of these bands was confirmed by addition of 100-fold excess of nonspecific competitor (lane 14). In contrast, a third DNA-protein complex (Fig. 4C, designated III) was confirmed to be nonspecific using these controls. These assays clearly demonstrated the ability of the Ϫ1306C allele, not the Ϫ1306T allele, to bind specifically nuclear protein(s).
The Transcription Factor Sp1 Binds to the Ϫ1306 C Allele but Not the T Allele-To determine the identity of the nuclear protein(s) that bind, in an allele-specific manner, to the MMP-2 promoter sequence at the Ϫ1306 site, we performed additional EMSAs to investigate the observation that an Sp1 consensus sequence (CCACC) is abolished by the presence of a T at the Ϫ1306 site; Ϫ1307 C(C/T)ACC Ϫ1303 . We therefore performed EMSAs using two oligonucleotide probes comprising the previously described Ϫ1306C probe and an Sp1 consensus probe (5Ј-ATTCGATCGGGGCGGGGCGAGC-3Ј). As seen in Fig. 5A, 32 P-labeled Sp1 consensus probe forms two specific DNA-protein complexes (lane 2) identical to those observed with the Ϫ1306C probe (lane 9), each being eliminated by excess unlabeled probe (lane 3, and lanes 10 and 14, respectively). Lanes 6 and 13 demonstrate the specificity of these DNA-protein complexes by competition experiments with 100-fold excess of unlabeled mutated Sp1 consensus probe (5Ј-ATTCGATCGGT-TCGGGGCGAGC-3Ј). Furthermore, the ability of the C allele, but not the T allele, to compete specifically for Sp1 binding was confirmed by additional competition experiments using 100fold excess of unlabeled Ϫ1306T probe (lanes 4 and 12), 50-or 100-fold excess of unlabeled Ϫ1306C probe (lanes 5 and 7), and 100-fold excess of unlabeled Sp1 consensus probe (lane 11). Supershift assays were then performed to confirm Sp1 binding using the Sp1 consensus probe (Fig. 5B, lanes 1-6) or Ϫ1306C probe (Fig. 5B, lanes 7-12) in either the absence (lanes 1 and 7) or presence of different antibodies (lanes 2-6 and lanes 8 -12,  respectively). Both DNA-protein complexes were successfully supershifted with anti-Sp1 antibody (lanes 2 and 8). In contrast, preincubation of Sp1 antibody with an Sp1 antibodyspecific blocking peptide abrogated the formation of a supershift complex (lanes 3 and 9). The specificity of the Sp1 supershift was further confirmed using a variety of isotypematched antibodies (lanes 4 -6 and lanes 10 -12, respectively). The results obtained from these EMSAs were reproduced using several different nuclear extracts as well as recombinant Sp1 protein (data not shown).
Sp1 Functions as an Activator of MMP-2 Promoter Activity in an Allele-specific Manner-To determine the allele-specific effects of Sp1 binding upon native promoter activity, two luciferase reporter gene constructs were generated by PCR, spanning Ϫ1691 to ϩ10 of the MMP-2 promoter sequence, with either a T or C at the Ϫ1306 polymorphic site (Fig. 6A) and used to transfect transiently epithelial cells (293) and macrophages (RAW264.7). As shown in Fig. 6B, reporter gene expression driven by the C allelic MMP-2 promoter was ϳ1.6-fold greater than reporter gene expression directed by the T allelic counterpart in epithelial cells (4.0 Ϯ 0.53 versus 2.45 Ϯ 0.31, p Ͻ 0.001) and ϳ1.4-fold greater in macrophages (4.27 Ϯ 0.29 versus 2.98 Ϯ 0.37, p Ͻ 0.01) emphasizing the biological significance of this common variant, on promoter activity, through recruitment of Sp1. DISCUSSION The common disease-common variant hypothesis takes account of the observation that the human population has relatively limited genetic diversity, such that common variants may contribute significantly to genetic risk for common disease (21,23). The human matrix metalloproteinase-2 gene (MMP-2) possesses proteolytic activity against type IV collagen, a major Each construct contains three concatenated 24-bp DNA oligonucleotides flanking the C or T allele. Concatenated oligonucleotides were cloned in both directions for each variant to verify that differences in allelic expression were independent of orientation and position of the regulatory element. Allelic expression, standardized for transfection efficiency, was measured as fold increase relative to the pEIF-4AI construct.

TABLE II
Mapping of MMP-2 promoter variants to potential cis-acting sequences using the TRANSFAC database (34) Nucleotide positions are referenced relative to the major transcription start-site (28). U, unable to identify any known consensus sequence (minimum threshold of 0.85 homology); N/A, not applicable.

Variant
Promoter sequence (variant in bold) Identical match to DNA consensus sequence CDE (37) component of the basement membrane, and is therefore implicated in an extensive array of pathologies including atherogenesis, arthritis, and tumor growth and metastasis (11)(12)(13)(14)(15)(16)(17). The present investigation was designed to search for naturally occurring genetic variation in the MMP-2 gene, to analyze the functional effects of promoter variants on gene expression and, thereby, to generate an informative panel of polymorphisms to test for possible association with a variety of clinical phenotypes. MMP-2 spans 17 kb and contains 13 exons encoding a 72-kDa protein (28). We scanned ϳ3.1 kb of MMP-2 transcribed sequence, 1.9 kb of promoter sequence, and ϳ1 kb of intronic Cells were transiently transfected with 0.5 g of reporter plasmid and 0.1 g of the pcDNA3-␤-galactosidase construct as described under "Experimental Procedures." After 24 h of incubation, luciferase and ␤-galactosidase activities were determined in triplicate, and the data were standardized for transfection efficiency. Fold activity was calculated by defining the activity of the pEIF-4AI vector as 1.
Data are the mean fold increase Ϯ S.D. from at least three experiments using forward orientated cloned constructs. Statistical analysis was performed by comparing the activity of opposing allelic constructs, with the exception of the Ϫ790/Ϫ787 constructs for which activity was compared with that of the designated wild-type construct (Ϫ790T/Ϫ787T).  Table I). Fifteen novel single base substitutions were identified as being distributed throughout the gene with six variants in the promoter (1 SNP/317 bp), one in the 5Ј-UTR (1 SNP/280 bp), six in the coding region (1 SNP/328 bp), one in intronic sequence (1 SNP/1000 bp), and one in the 3Ј-UTR (1 SNP/799 bp; Fig. 2). Collectively, these data represent a medium level of sequence diversity, reflecting sequence conservation through natural selection during evolution, which is similar to previous reports assessing the extent of molecular variation at other loci (40,41). In concordance with these studies, we found that variants in the coding region were much more prevalent at synonymous than nonsynonymous sites, with only one SNP causing an amino acid substitution (from glycine to serine at codon 456). This highly conserved amino acid is situated in the hinge region of the protein, which links the catalytic domain to the hemopexin-like domain. The physical constraints imposed on glycine by this proline-rich linker region suggest that substitution by a larger hydrophilic amino acid may have a deleterious effect upon the juxtaposition of the two domains thereby disrupting normal enzymatic activity (32). We are currently undertaking in vitro studies to address this issue. Predicting the functional consequences of the synonymous substitutions is more difficult due to the limited num-ber of studies addressing the contribution of such variants to structural diversity of mRNA. However, a recent investigation demonstrated that single nucleotide variation at synonymous sites can give rise to allele-specific mRNA folds that ultimately possess different biological functions (42). Accordingly, it remains possible that some synonymous SNPs may not be silent neutral variants.
Likewise, SNPs that affect gene regulation are equally important in disease risk, but it is much more difficult to segregate such variants from among the much larger pool of neutral counterparts by inspection alone, given our limited knowledge of DNA regulatory regions. To circumvent this problem we functionally characterized the entire panel of promoter polymorphisms, extending over a 2-kb interval, identified in this study. Limited functional analyses of the MMP-2 promoter have been performed previously, and although sequence analysis demonstrates the presence of multiple potential cis-acting elements, few have been characterized. We therefore generated a panel of constructs that allowed direct comparison of allelic expression in the context of the regulatory element in which the variant was located (Fig. 3). Data base analysis confirmed that three variants, at Ϫ1306, Ϫ790, and ϩ220, mapped perfectly to consensus sequences for Sp1, GATA-1 and CDE, re-FIG. 5. Electrophoretic mobility shift assays demonstrate that Sp1 can bind to the ؊1306C allele but not the ؊1306T allele. A, 32 P-labeled Sp1 consensus probe (Sp1cons; lanes 1-7) or Ϫ1306C probe (lanes 8 -14) were incubated in the absence (lanes 1 and 8) or presence of A10 nuclear extracts (lanes 2-7 and lanes 9 -14), electrophoresed, and visualized by autoradiography. Competition experiments with 100-fold excess of unlabeled Sp1 consensus probe (lanes 3 and 11), unlabeled Ϫ1306T probe (lanes 4 and 12), unlabeled Ϫ1306C probe (lanes 5 and 10), and unlabeled mutated Sp1 consensus probe (Sp1mut; lanes 6 and 13) were performed as shown. Lanes 7 and 14 depict competition experiments with a 50fold excess of unlabeled Ϫ1306C probe. The bands representing specific DNAprotein complexes (I and II) are indicated. B, probes were prepared as above and incubated with 293 nuclear extracts either in the absence (lanes 1 and 7) or presence of antibodies to Sp1 (lanes 2 and 8), AP-2 (lanes 4 and 10), RANTES (lanes 5 and 11), or rabbit IgG (lanes 6 and 12). Lanes 3 and 9 represent anti-Sp1 antibody preincubated with anti-Sp1 antibody-specific blocking peptide prior to addition of nuclear extracts. Sp1 antibody supershifted both bands of the DNA-protein complex as indicated by the arrow. Nonspecific AP-2, RANTES, or rabbit IgG had no effect. spectively (Table II). Transient transfection studies were performed in multiple cell types based on the applicability of each to different disease phenotypes. By using selection criteria based on previous proposals (38,39), we demonstrated that the majority of variants assessed in this manner were nonfunctional, comprising those that did not map to known consensus site(s) as well as some that did, namely variants at Ϫ790 (GATA-1) and ϩ220 (CDE) ( Table III). These results may be explained, in part, by the nature of the consensus element itself. The Ϫ790 variant maps to a degenerate site outside of the core GATA-1 consensus region, whereas the CDE is typically found in cell cycle regulator genes in association with a contiguous element, absent in the MMP-2 promoter, to form a bipartite repressor regulating gene expression. The apparent nonfunctionality of the Ϫ790 and ϩ220 transversions are important observations demonstrating the inherent problems that exist in predicting which SNPs in noncoding DNA will be functional.
Our biological experiments did, however, demonstrate that the common C 3 T transition at position Ϫ1306, which interrupts an Sp1 site, is indeed functional. Transient transfection experiments showed that the Ϫ1306C allele increased promoter activity in two different luciferase reporter gene constructs, one in the context of the Sp1 regulatory element and the other in the background of the native MMP-2 promoter. Both showed that reporter gene expression was between ϳ1.4 -2-fold higher with the C allele than the T allele ( Fig. 6 and Table III). We performed EMSAs to determine whether disparity in allelic expression was attributable to the differential binding of nuclear protein(s) (Fig. 4). Two DNA-protein complexes were detected as binding to the C allele, but not the T allele, and additional competition experiments combined with supershift analysis identified the protein binding to this region as Sp1 (Fig. 5). Sp1 is a ubiquitously expressed transcription factor that binds to GC/GT-rich elements and regulates a variety of genes in a constitutive or inducible manner (43,44). One such motif, the CCACC box, has been shown to be essential for Sp1 binding and promoter function in several genes by invariably activating transcription (35,45,46). Sp1 is a multifunctional protein that can directly interact with the basal transcription complex, as recently shown for the MMP-2 proximal promoter (47), or alternatively function as a more general transcription factor and play an important role in directing tissue-specific expression (48,49). Its ability to bend DNA (50) and self-associate to loop out intervening promoter regions (51) enables it to interact with other transcription factors, such as NF-B (52), to exert a synergistic effect essential for modulating gene activation. Clearly any variant that abolishes Sp1 binding, such as the MMP-2 Ϫ1306 polymorphism, has the potential to affect the level and specificity of gene transcription. Indeed these results highlight the importance of the combinatorial nature of transcription factors in adjusting MMP-2 expression profiles and imply that this regulatory mechanism may be more important than previously suggested (53), thereby introducing a further level of control ahead of proenzyme activation.
We believe that the MMP-2 Ϫ1306 polymorphism is a strong candidate variant for direct allelic association studies, based on FIG. 6. Transient reporter gene expression assays with constructs containing full-length MMP-2 promoter. A, schematic presentation of reporter gene constructs, used in transient transfections, containing a 2-kb MMP-2 promoter with the only difference between the two constructs being a T or C at the Ϫ1306 polymorphic site. B, epithelial cells (293) and macrophages (RAW264.7) were transiently transfected as described under "Experimental Procedures." Luciferase and ␤-galactosidase levels were determined in triplicate and standardized for transfection efficiency; fold increase was determined by defining the activity of the empty pGL3 Basic vector as 1. Data shown are mean fold increase Ϯ S.D. from eight independent experiments. **, p Ͻ 0.01; ***, p Ͻ 0.001. several observations. First, although these findings do not preclude the possibility that functional variants exist elsewhere in the MMP-2 promoter, the functional effect of the MMP-2 Ϫ1306 polymorphism on gene expression through Sp1 binding is clear. Second, the identification of several nonfunctional variants reduces the need for multiple hypothesis testing thereby reducing the risk of false positive association. Third, the high allele frequency of this variant (0.26) makes it more informative for association analysis, particularly in family-based tests, and also increases the power to detect linkage disequilibrium within the region. Interestingly, a T 3 C polymorphism in the human CYP17 gene that creates an Sp1 site (CCAC(T/C)) has been shown to be associated with polycystic ovary syndrome (54), breast cancer (55,56), and prostate cancer (57).
In summary, we have described the nature and extent of nucleotide variation in the human MMP-2 gene by identifying 15 novel SNPs distributed throughout the coding and noncoding regions. We have characterized an unreported functional promoter polymorphism that produces an allele-specific effect on expression through abolishing binding of the transcription factor Sp1. We believe this systematic approach of characterizing functional variants in the regulatory regions of important candidate genes will facilitate the rapid elucidation of common disease susceptibility, or resistance, variants. These studies indicate that the MMP2 Ϫ1306 polymorphism will be informative in tests of association in a wide spectrum of pathologies in which a role for MMP-2 is implicated. Such studies will test the robustness of the common disease-common variant hypothesis and improve our understanding of SNP involvement in complex genetic disease.