Organization of the gene encoding the human endothelin-converting enzyme (ECE-1).

The two human endothelin-converting enzyme (ECE-1) isoforms, which differ by their N-terminal region, are encoded by a single gene. The gene is composed of 19 exons that span more than 68 kilobases and has been mapped to the 1p36 band of the human genome. The two isoform mRNAs display different tissue distributions. Their precursors are transcribed from two distinct start sites, upstream from exon 1 and exon 3, respectively. Sequence analysis of the two putative promoters revealed the presence of motifs characteristic for several transcription factors. Comparison of the ECE-1 gene structure with those of other zinc metalloproteases, as well as a phylogenetic study, confirm the existence of a metalloprotease subfamily composed of ECE-1, ECE-2, neutral endopeptidase, Kell blood group protein, and two bacterial enzymes.

The two human endothelin-converting enzyme (ECE-1) isoforms, which differ by their N-terminal region, are encoded by a single gene. The gene is composed of 19 exons that span more than 68 kilobases and has been mapped to the 1p36 band of the human genome. The two isoform mRNAs display different tissue distributions. Their precursors are transcribed from two distinct start sites, upstream from exon 1 and exon 3, respectively. Sequence analysis of the two putative promoters revealed the presence of motifs characteristic for several transcription factors. Comparison of the ECE-1 gene structure with those of other zinc metalloproteases, as well as a phylogenetic study, confirm the existence of a metalloprotease subfamily composed of ECE-1, ECE-2, neutral endopeptidase, Kell blood group protein, and two bacterial enzymes.
Endothelin, first isolated from cultured porcine endothelial cells (1), has recently opened a new field of research activities. Distinct genes encode three distinct isopeptides, endothelin-1 (ET-1), 1 endothelin-2 (ET-2), and endothelin-3 (ET-3), which form the endothelin family (2). ET-1 is regarded as the most potent vasoconstrictor known at present. The measurement of endothelin concentrations in biological fluids and studies using endothelin receptor antagonists pointed to the important role of endothelin in a number of pathophysiological situations including subarachnoid hemorrhage (3,4), chronic heart failure (5,6), and hypertension (7,8). Furthermore, the recent targeted disruptions of ET-1 (9), ET-3 (10), and ET B receptor genes (11) have demonstrated the importance of the endothelin system during embryogenesis, more especially with respect to the development of neural crest-derived tissues.
To form the active endothelins, big endothelins have to be cleaved at Trp 21 -Val/Ile 22 . A putative endopeptidase specific for this cleavage has been claimed (1) and has been referred to as endothelin-converting enzyme (ECE). Complementary DNAs coding for two ECEs have been recently isolated (12)(13)(14)(15)(16)(17)(18), and the corresponding proteins have been termed ECE-1 (13) and ECE-2 (18). Both enzymes are membrane zinc-binding metalloendopeptidases, with a single transmembrane domain defining a short cytoplasmic tail and a large N-terminal domain, which contains the active site. ECE-1 has been shown to be mainly located on the plasma membrane (19), whereas the acidic pH optimum of ECE-2 suggests an intracellular localization. Moreover, the wide and abundant tissue distribution of ECE-1 mRNA favors the hypothesis that this enzyme is mainly responsible for the cleavage of big endothelins.
We have previously reported the cloning of two human ECE-1 isoforms, 2 which both displayed an endothelin-converting activity. These two enzymes have been tentatively termed ECE-1a and ECE-1b and differ by their N-terminal extremities. Sequences of the two corresponding cDNAs have since been separately published (15)(16)(17). We present here the structure of the ECE-1 gene and its chromosomal localization, and thus show that ECE-1a and ECE-1b are encoded by the same gene through the use of two promoters separated by approximately 11 kb. We also analyze the evolutionary relationship existing between ECE-1 and several other zinc metalloproteases by comparing their sequences and gene structures.
Characterization of the Intronic Organization of the Gene-DNA fragments corresponding to introns 5, 6, 7, 9, 11, 12, 13, 15, 16, and 18 were amplified by PCR using phage DNA as a template, subcloned in the plasmid pCRII (Invitrogen), and further analyzed to obtain the exon-intron boundaries. Other boundaries were obtained by subcloning and sequencing convenient restriction fragments (containing exons 1, 2, 3, 4) or by sequencing directly phage DNA using specific primers and the Femtomole Sequencing kit (Promega).
Northern Analysis-Total RNA (10 g) from ECV304 cells (20) and human umbilical vein endothelial cell primary cultures (HUVEC) was fractionated on a 1.2% (w/v) agarose gel containing 20 mM MOPS, 5 mM sodium acetate, 1 mM EDTA, and 1.8% (v/v) formaldehyde and transferred overnight to Nylon Nϩ (Amersham) with 20 ϫ SSC (1 ϫ SSC is 150 mM sodium chloride, 15 mM sodium citrate dihydrate). Two blots, each displaying mRNA (2 g) from eight human tissues, were pur-* The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
The nucleotide sequence(s) reported in this paper has been submitted to the GenBank TM /EMBL Data Bank with accession number(s) X91922-X91939.
In Situ Hybridization on Chromosomes-Hybridization was carried out on chromosome preparations obtained from phytohemagglutininstimulated human lymphocytes cultured for 72 h. 5-Bromodeoxyuridine was added for the final 7 h of culture (60 g/ml of medium) to ensure a posthybridization chromosomal banding of good quality. An ECE-1 cDNA fragment (nucleotides 269-2377), 3 subcloned in pBluescript KSII (Stratagene), was tritium labeled by nick translation to a specific activity of 10 8 dpm/g. The radiolabeled probe was hybridized to metaphase spreads at a final concentration of 100 ng/ml of hybridization solution as described previously (21). After coating with nuclear track emulsion (Kodak NTB2), the slides were exposed for 20 days at ϩ4°C. To avoid any slipping of silver grains during the banding procedure, chromosome spreads were first stained with buffered giemsa solution, and metaphases were photographed. R-banding was then performed by the fluorochrome-photolysis-giemsa method, and metaphases were rephotographed before analysis.
SLIC-The 5Ј-flanking regions of ECE-1 mRNA were determined using a method called SLIC (22). In brief, first-strand cDNA was synthesized using ECV304 or HUVEC total RNA (5 g), with two different ECE-1 primers (5Ј-TGATGATTGCTTGGTTGTG-3Ј and 5Ј-AGGCG-TAGCTGAAGAAGTC-3Ј). The purified single-stranded cDNA was ligated at their 3Ј-ends to a 50-residue oligonucleotide and used as a template for nested PCR with two successive sets of primers. Each set was composed of an ECE-1 specific primer (5Ј-CGCCAGAAGTACCAC-CAACACCACC-3Ј and 5Ј-AGCCGCTTCTCCACCTGGGTCCGTG-3Ј) and of a 20-residue primer complementary to the 50-residue oligonu-cleotide. The PCR products were probed by Southern blotting with a primer located further upstream (5Ј-GAACTTCCACAGCCCCCG-GAGT-3Ј). Products giving a strong positive signal were subcloned into the pCRII plasmid vector (Invitrogen). 12 ECV304 transcripts and 15 HUVEC transcripts were sequenced.
RNase Protection-Total RNA (10 g) from HUVEC and ECV304 cells was protected as described previously (23) with two 32 P-labeled antisense RNAs, specific of ECE-1a and ECE-1b. The two probes were transcribed in vitro from two cloned human gene fragments (nucleotides Ϫa260 to Ϫa83 and Ϫb31-b265 on Fig. 4A). Protected RNA was resolved on a sequencing gel using a known DNA sequence and RNA markers for size determination.

RESULTS AND DISCUSSION
To isolate the gene encoding ECE-1, a human genomic library was screened using ECE-1 cDNA fragments or oligonucleotides as hybridization probes. A total of 59 clones turned out to be positive. Six clones, encompassing the major part of the gene, were selected and analyzed. The ECE-1 gene spreads over more than 68 kb and contains 19 exons (Fig. 1A). Almost all the exon/intron boundaries display the consensus splice donor and acceptor sequences (Fig. 4B). Two nucleotide differences, which could reflect allelic variations, were detected between the genomic coding sequence and the various published cDNA sequences: a silent mutation at position 1119 3 (C to T) and a mutation (G to C) at position b5 on the ECE-1b sequence (Fig. 4A), which replaces an arginine by a proline.
Two isoforms of human ECE-1 have been characterized. 2 ECE-1a (758 residues) and ECE-1b (770 residues) differ only by their N termini and share the same C-terminal 726 residues. Sequences of the corresponding human mRNAs have been published separately (15)(16)(17). Northern analysis (Fig. 3A) shows that these two mRNAs are widely expressed in human tissues but that their relative abundance varies. Indeed, ECE-1b signals were stronger than ECE-1a signals in pancreas, peripheral blood leukocytes, prostate, testis, colon, and ECV304 cells, whereas lung, spleen, placenta, small intestine, and HUVEC exhibited stronger ECE-1a signals. The relative levels of the two isoform mRNAs should be assessed by a more quantitative method like RNase protection.
To check that the two isoforms were produced by the same gene, a genomic Southern blot was hybridized with a short ECE-1 cDNA fragment (nucleotides 1654 -2086, 3 corresponding to exons [15][16][17][18]. Two EcoRI fragments (approximately 3 and 10 kb) and a 9-kb BamHI fragment were detected, which is in agreement with the restriction map of the ECE-1 gene. The uniqueness of the ECE-1 gene was confirmed by chromosomal in situ hybridization (Fig. 2). Indeed, in the 150 metaphase cells examined after hybridization with an ECE-1 cDNA probe, 91 of the 212 silver grains associated with chromosomes were located on chromosome 1. The distribution of grains on this chromosome was not random, since 75 (82.4%) of them mapped to the p36 band of chromosome 1 short arm. From these results, we conclude that the two isoform mRNAs originate from a single gene located on the 1p36 band of the human genome.
RNase protection assays in combination with the SLIC methodology (22), which enables to clone primer extension products, revealed two different transcription start sites (Fig. 3B and 4A) corresponding to the two ECE-1 isoforms. All the ECV304 SLIC clones started at the same site, located eight nucleotides upstream from ECE-1b ATG (nucleotide Ϫb8). This start point was also detected by RNase protection and is in line with the first nucleotide (Ϫb9) of a published ECE-1b cDNA (17). RNase protection displayed other signals, more proximal (positions b8, b10, and b18). RNAs transcribed from these putative start sites would encode a shorter protein of 753 residues, corresponding to a downstream ATG codon (position b258). The putative existence of such a third isoform will be investigated. A major ECE-1a transcription start point was detected by RNase protection at position Ϫa213. HUVEC SLIC clones started at various positions, spreading over a region of 94 base pairs (positions Ϫa212 to Ϫa119 on Fig. 4A). This could reflect minor transcription start points, which would be in agreement with the presence of additional faint signals in the protection assay.
The ECE-1 gene organization and the localization of its transcription start sites explain the N-terminal ECE-1 duality (Fig.  1B). Exons 1 and 2 encode ECE-1b specific sequence, and synthesis of ECE-1b mRNA includes the splicing of an ECE-1a specific part of exon 3 (to nucleotide a102). Such an organization suggests the presence of two isoform-specific promoters, located upstream from exon 1 and upstream from exon 3, respectively. A similar situation, also leading to the existence of two enzyme isoforms exists for the glucokinase gene (24).
Regions located upstream from exon 1 and exon 3 (Fig. 4A), which should direct the transcription of ECE-1b and ECE-1a RNAs, respectively, display many putative binding sites for known transcription factors (25). The region surrounding the ECE-1b start point presents features characteristic of a housekeeping gene promoter. Indeed, it lacks both CAAT and TATA boxes, is very rich in GC content, and presents many SP1 and AP2 consensus sites. Several of these motifs are located in the small first intron, which therefore may play a role in the transcription. ECE-1b putative promoter also contains three shear-stress response motifs, a putative binding site for ISFG3 factor, and an inverted GATA box. Transcription factor GATA-2 plays an important role in the preproendothelin-1 gene transcription (26), and shear-stress response elements are present in several endothelial genes (27,28). ECE-1a transcription start region does not display housekeeping gene features but contains a CAAT box and potential binding sites for glucocorticoid receptors, NF-kappaB, PU-1, AP1, AP2, and c-ets1 transcription factors, as well as one shear stress and three acute phase (29) responsive elements. The protooncogene c-ets1 has been shown to be expressed in endothelial cells during angiogenesis and tumor vascularization (30). The biological relevance of these sites needs investigating, especially with respect to different transcriptional regulations of ECE-1a and ECE-1b isoforms.
Among all known zinc metalloproteases, only two bacterial and three mammalian enzymes display a significant sequence homology with ECE-1, clear enough to suggest that the six enzymes originate from a common precursor. The three mam-

FIG. 2. Chromosomal localization of the human ECE-1 gene.
Diagram of the G-banded human chromosome 1 illustrating the distribution of labeled sites (q) after hybridization with an ECE-1 cDNA probe. RNase protection (B). A, each sample contains 2 g of poly(A) ϩ RNA, except for HUVEC and ECV304 cells (10 g of total RNA). The blots were hybridized with a probe common to both isoforms (ECE-1) and with probes specific for each isoform (ECE-1a and ECE-1b), revealing a major signal in the range of 4.8 kb. A minor transcript of approximately 3.5 kb can also be detected in ovary. Hybridization to human ␤-actin probe is shown as control for integrity of samples. Sk. Muscle, PBL, and S. Intestine refer to skeletal muscle, peripheral blood leukocytes, and small intestine, respectively. B, total RNAs (10 g) from HUVEC (lanes 2 and 4), ECV304 cells (lanes 1 and 5), and yeast (lanes 3 and 6) were protected with 32 P-labeled antisense RNAs transcribed from two genomic fragments corresponding (Fig. 4A) to nucleotides Ϫa260 to Ϫa83 (ECE-1a; lanes 1-3) and to nucleotides Ϫb31 to b265 (ECE-1b; lanes 4 -6). Approximate (Ϯ1 nucleotide) corresponding positions (on Fig. 4A) of the signals are indicated. malian enzymes are ECE-2 (18), neutral endopeptidase (NEP) (31) and Kell blood group protein (Kell) (32). The Lactococcus lactis PepO (33) and a Streptococcus gordonii metalloendopeptidase 4 are devoid of transmembrane domain and somewhat shorter (by approximately 100 residues) than the other enzymes. A phylogenetic tree (Fig. 5B), realized with the peptide sequences, depicts the evolutionary relationship within this enzyme family. A second tree, deduced from the comparison of 120-residue regions surrounding the zinc binding domain (Fig. 5C), exchanges the positions of the bacterial enzymes and Kell. This corresponds to a surprisingly high homology within this 120-residue domain (50% of identity in amino acids between ECE-1 and PepO versus 25% for the whole sequence) between the bacterial endopeptidases and their mammalian parents. Such a high conservation may indicate that the con-4 P. E. Kolenbrander, unpublished data.

FIG. 5. Comparison of ECE-1, NEP, and Kell gene structures (A) and phylogenetic analysis (B and C).
A, coding regions of the 3 cDNAs are represented interrupted by the exon/intron boundaries. ç and å indicate the boundaries exactly conserved between the three genes and between ECE-1 and Kell, respectively. Positions of the nucleotides encoding the transmembrane domains (TM) and the zincbinding motifs (HEXXH) are indicated. B, phylogenetic tree of the ECE-Kell-NEP family, realized with the GCG (Genetic Computer Groups) package. h, human; b, bovine; r, rat; and Strep for an unknown metalloprotease of S. gordonii. GenBank/EMBL accession numbers of the used sequences are as follows: rNEP, P07861; hNEP, P08473; hECE-1, S47269; rECE-1, A53679; bECE-1, S47268; bECE-2, U27341; hKell, P23276; PepO, A47098; and Strep, L11577. Sequences were aligned, and only the peptide regions that could be aligned with PepO sequence were retained for the analysis, which corresponded roughly to the extracellular parts of the mammalian enzymes (e.g. from residue 77 to C-terminal for hNEP). C, phylogenetic tree of the ECE-Kell-NEP family, using this time the peptide stretches located around the HEXXH motif that could be aligned with residues 534 -660 of hNEP (surrounding the HEXXH motif). cerned region is subject to severe evolutionary constraints and that many of its residues are necessary to keep the proper folding of the zinc metalloprotease active site. The genes encoding NEP and Kell have been previously characterized (34,35). The similar organizations of the three genes (Fig. 5A) confirm that they belong to the same family and moreover indicate that divergence between the mammalian genes occurred after the divergence of eukaryotes and prokaryotes. The same genes (NEP and Kell) have also been mapped, on the human genome, to chromosomes 3q21-27 (36) and 7q33 (37), respectively. The ECE-1 gene is localized on chromosome 1p36 (Fig. 2). To date, at least two other gene families enable the establishment of a link between these chromosomal regions: the genes encoding the carboxypeptidases A1 (38) and A3 5 are located on 7q32 and 3q21-25, respectively, and the paired box homeotic genes (PAX) 4 and 7 are located on 7q32 and 1p36, respectively (39,40). These gene positions may be the remnants of large DNA duplications that were at the origin of the ECE-Kell-NEP family diversification.