Analysis of the Regulation of the A33 Antigen Gene Reveals Intestine-specific Mechanisms of Gene Expression*

, The A33 antigen is a transmembrane protein expressed almost exclusively by intestinal epithelial cells. The level of its expression is robust and uniform throughout the rostrocaudal axis of the human and mouse intestines. In the colon, strong expression is found in the basolateral membranes of both the proliferating cells in the lower regions of the crypts and the differentiating cells in the upper regions of crypts. Sim-ilarly, in the small intestine, the protein is highly expressed by all the epithelial cells in the crypts and by the differentiated cells migrating over the villi. Thus, the A33 antigen has emerged as a definitive marker for all intestinal epithelial cells, irrespective of cell lineage and differentiation status. To understand the molecular mechanisms mediating this rare tissue-specific expression pattern, we undertook a comprehensive analysis of the 5 (cid:1) -regulatory region of the human A33 antigen gene. This allowed us to point to positive cis -regulatory elements incorporating consensus Kru¨ppel-like factor and caudal-related homeobox (CDX)-binding sites, located just upstream from the human A33 antigen transcription start site, as being important for the intestine-spe-cific expression pattern of Characterization of the mA33 Antigen Gene and 5 (cid:1) Flanking Sequence— A 217-bp PCR of V-type domain mA33 forward Two almost completely 13 kb of -flanking start site of mA33 colonic cloned

The A33 antigen is a transmembrane protein expressed almost exclusively by intestinal epithelial cells. The level of its expression is robust and uniform throughout the rostrocaudal axis of the human and mouse intestines. In the colon, strong expression is found in the basolateral membranes of both the proliferating cells in the lower regions of the crypts and the differentiating cells in the upper regions of crypts. Similarly, in the small intestine, the protein is highly expressed by all the epithelial cells in the crypts and by the differentiated cells migrating over the villi. Thus, the A33 antigen has emerged as a definitive marker for all intestinal epithelial cells, irrespective of cell lineage and differentiation status. To understand the molecular mechanisms mediating this rare tissue-specific expression pattern, we undertook a comprehensive analysis of the 5-regulatory region of the human A33 antigen gene. This allowed us to point to positive cis-regulatory elements incorporating consensus Krü ppel-like factor and caudal-related homeobox (CDX)-binding sites, located just upstream from the human A33 antigen transcription start site, as being important for the intestine-specific expression pattern of this gene. Further analysis provided evidence that the A33 antigen gene may be one of only a few target genes to be described thus far for the intestine-specific homeobox transcription factor, CDX1. Taken together, our data lead us to propose that the activity of CDX1 is pivotal in mediating the exquisite, intestine-specific expression pattern of the A33 antigen gene.
Efforts to understand the mechanisms underlying tissuespecific patterns of gene expression have focused on the recognition of conserved regulatory sequences within tissue-specific promoters and the identification of transcription factors that are expressed in a tissue-specific manner. In some tissues, the acquisition of specific characteristics appears to be governed by a single tissue-specific transcription factor (or family of transcription factors) responsible for the activation of multiple target genes (1)(2)(3)(4)(5)(6). Thus, a number of "master genes" encoding transcription factors that appear to determine the differentiation of the entire tissue have emerged. These include the runt family transcription factor, Cbfa-1, which is required for osteoblast differentiation (1,2), and the myogenic regulatory network (comprising the helix-loop-helix transcription factors, MyoD, Myf-5, Myf-6, and myogenin) responsible for specifying muscle (3)(4)(5)(6). Whether similar mechanisms govern the specialized patterns of gene expression seen in other tissues is currently under intense investigation.
The intestinal epithelium is a complex and dynamic tissue containing migrating cells that undergo phases of proliferation, differentiation, and apoptosis over the course of a few days (7). Such diverse cellular processes probably require transient patterns of gene expression. In many instances, cells positioned in the proliferative compartment in the lower regions of the crypts in the colon and small intestine express different genes compared with cells differentiating in the upper regions of the crypts and those in the small intestine migrating along the villi. The caudal-related family of transcription factors, which includes CDX1 1 and CDX2, appears to differentially regulate the expression of many genes that contribute to intestinal function (8 -15).
To provide further insights into mechanisms employed to maintain intestine-specific patterns of gene expression, we investigated the regulation of expression of the human and mouse genes encoding the intestine-specific A33 antigen (16 -18). The human and mouse A33 antigens are type I transmembrane glycoproteins of the immunoglobulin superfamily (17,18) that share a strikingly similar and highly restricted pattern of expression. In both species, A33 antigen is expressed in epithelial cells throughout the rostrocaudal axis of the small and large intestines (16,18), where the protein appears to play a role in modulating the gut immune system. 2 In all cells the protein is specifically trafficked to basolateral membranes and is absent from apical membranes (18). An interesting feature of A33 antigen expression is its robust and uniform appearance in both the proliferating immature cells in the lower regions of crypts and the postmitotic, differentiating cells in the upper regions of crypts and throughout the small intestinal villi (16,18). In addition to strong intestinal expression, mouse A33 antigen (mA33 antigen) expression has also been detected in the stomach pyloric epithelium and bladder urothelium (18).
The pattern of intestinal expression exhibited by the A33 antigen appears to be unique among intestine-specific proteins, as shown schematically in Fig. 1. The drawing depicts the small intestine, cecum, and colon represented as a linear tube with gradients of gene expression indicated for a number of intestine-specific genes. For the A33 antigen, solid bars indicate uniform expression throughout the intestine, whereas other genes, such as Cdx1, Cdx2, intestinal (iFabp) and liver (lFabp) fatty acid-binding protein genes, carbonic anhydrase 1 (Ca1), villin, and dra (down-regulated in adenoma) are differentially expressed (8, 19 -22). The gradients (Fig. 1A) indicate differential expression along the rostrocaudal axis of the intestine, whereas the vertical bars indicate differential expression depending on the position of the cell along the crypt-villus axis in the small intestine (Fig. 1B) and along the crypt axis in the colon (Fig. 1C).
This background prompted us to undertake a study of the human A33 (hA33) antigen gene with the aim of delineating regions of the promoter that may mediate its unique pattern of expression in vivo. Here we describe the cloning, chromosomal localizations, exon-intron structures, and transcription start sites of both the hA33 and mA33 antigen genes. A comprehensive in vitro characterization of the 5Ј-regulatory region of the hA33 antigen gene is presented that allows us to point to positive cis-regulatory elements located just upstream from the transcription start site as being important for the intestinespecific expression pattern of this gene. Our results provide evidence that the A33 antigen gene may be one of a few target genes to be described thus far for the intestine-specific homeobox transcription factor, CDX1 (24 -26).
Cloning and Characterization of the hA33 Antigen Gene and 5Ј-Flanking Sequence-The exon-intron structure of the hA33 antigen gene was assembled from genomic clones, and PCR products were amplified from human genomic DNA. Approximately 10 6 clones of a human placental genomic DNA library in Lambda FIX II (Stratagene, La Jolla, CA) were screened with a 2.6-kb hA33 antigen cDNA probe (17) labeled with [␣-32 P]dCTP (30 Ci/mmol; Amersham Biosciences) using the Megaprime DNA labeling system (Amersham Biosciences). The insert from one putatively positive clone (SW1; size, ϳ17 kb) was excised using NotI and subcloned into pBluescript KS ϩ (Stratagene). In addition, PCRs of human genomic DNA with exon-specific primers designed to amplify individual introns were carried out using the EX-PAND 20kb Plus PCR system (Roche Molecular Biochemicals). Each exon-intron boundary and all of introns 5 and 6 were sequenced on both strands. To obtain 5Ј-flanking sequences of the hA33 antigen gene, a 488-bp probe was generated by PCR using forward primer 2843 (5Ј-TTTCCCAGTGAGCTCTCTCT) and reverse primer 2844 (5Ј-TTGAGT-TGGGTTCTGTGACT) to amplify a region in intron 1 in clone SW1. One putatively positive clone (28A) was obtained from a screen of the human placental genomic library, and the insert (size, ϳ17 kb) was excised with NotI and subcloned into pBluescript (KS ϩ ).
The transcription start site of the hA33 antigen gene was determined using a 5Ј-RNA ligase-mediated rapid amplification of cDNA ends (RACE) method (GeneRacer; Invitrogen). The GeneRacer strategy aims to amplify only cDNA molecules containing the ultimate 5Ј terminus. Briefly, 5 g of total RNA extracted from both LIM1215 cells and normal human colonic epithelium were dephosphorylated with calf intestinal phosphatase to remove the 5Ј-terminal phosphates from all RNA molecules without a 5Ј-methyl guanosine cap structure. The RNA was then treated with tobacco acid pyrophosphatase to remove all the caps and release free 5Ј-phosphates, which were ligated to synthetic RNA adaptor molecules provided in the kit. Human A33 antigen genespecific antisense primer, FJC1 (5Ј-GAGTGTAAACAACACAGGCCA-CATCTTC), was used with THERMOSCRIPT RT (Invitrogen) at 65°C to generate cDNA strands complementary to full-length mRNA molecules. These were amplified in touchdown PCRs using PLATINUM Taq DNA polymerase High Fidelity (Invitrogen) to generate products containing the transcription start site. The PCR products were gel-purified and cloned into the pCR 4-TOPO cloning vector (Invitrogen), and multiple clones were sequenced on both strands.
Cloning and Characterization of the mA33 Antigen Gene and 5Ј-Flanking Sequence-A 217-bp PCR product encompassing part of the V-type domain of mA33 antigen was generated using forward primer 1867 (5Ј-TGACAAAGAAATACATC) and reverse primer 1868 (5Ј-TCT-GGCTTGGAGGGTGG), radiolabeled, and used to screen 1.25 ϫ 10 6 clones of a 129/SvJ mouse genomic library in the Lambda FIX II vector (Stratagene). The inserts from three putatively positive clones (1A, 4A, and 7B; size range, 16 -18 kb; Fig. 2) were excised using NotI and subcloned into pBluescript (KS ϩ ). The distribution of exons was determined by Southern analysis using mA33 antigen exon-specific oligonucleotide probes. To obtain mA33 antigen genomic clones containing more 5Ј regions, a 132-bp probe encompassing most of exon 1 was generated using forward primer 3428 (5Ј-GCCAGAGGCCATAGCTT-TAACCAGACAGCC) and reverse primer 6240 (5Ј-TGCACAGAGCATC-CACACCA) and used to screen the 129/SvJ genomic library as described above. Two almost completely overlapping clones (11.1 and 15.1) were obtained that both contained exon 1 of mA33 antigen and ϳ13 kb of the 5Ј-flanking sequence. To determine the transcription start site of the mA33 antigen gene, we used 5Ј-RNA ligase-mediated-RACE (as described above) with 5 g of total RNA extracted from normal mouse colonic epithelium and the gene-specific antisense primer, FJC6M (5Ј-AAGGCAGGATGTGTGGTGTGGATGTTCT). PCR products were gel-purified and cloned into the pCR 4-TOPO cloning vector (Invitrogen), and multiple clones were sequenced on both strands.
Chromosomal Localization of the Human A33 Antigen Gene using Fluorescence in Situ Hybridization-The entire hA33 antigen genomic clone, SW1, was nick-translated using biotin-14 dATP and biotin-14 dCTP (Invitrogen) and hybridized to normal human metaphase spreads in two independent experiments. Chromosome preparations were obtained from phytohemagglutinin-stimulated normal human peripheral blood lymphocytes and cultured for 72h. To induce R-banding, some of the cultures were synchronized with thymidine after 48 h, incubated overnight at 37°C, and treated with 5-bromodeoxyuridine the next morning, during the final late S phase, and harvested 6 h later (28). Cytogenetic harvests and slide preparations were performed using standard methods. Fluorescence in situ hybridization to metaphase chromosomes was performed as described previously (29), with a few modifications. The biotin-labeled probe (100 ng), 0.1 g of Cot-1 DNA, and 0.5 g of herring sperm DNA were dissolved in DenHyb hybridization mixture (Sigma Chemical Co.) and denatured at 72°C for 10 min to allow the Cot-1 DNA to anneal to repetitive sequences in the probe. The probe mixture was then applied to the slide and co-denatured with the metaphase spreads for 10 min at 82°C on a slide warmer. Hybridization was allowed to proceed for a minimum of 24 h in a 37°C incubator. The slides were washed at 37°C in 2ϫ SSC and 50% formamide for 8 min. Biotin-labeled probe detection was accomplished by incubation with the fluorescein isothiocyanate-avidin conjugate (Intergen, Purchase, NY). Chromosome identification was performed by simultaneous hybridization with a chromosome 1-specific ␣-satellite repeat probe (Intergen) or by R-banding using 5-bromodeoxyuridine and mounting the slides in a modified anti-fade (p-phenylenediamine, pH 11, containing 0.01 g/ml propidium iodide as counterstain) to produce an R-banding pattern (30).
Chromosomal Localization of the mA33 Antigen Gene-A 2.1-kb mA33 antigen cDNA probe (18) was used in Southern analyses to identify mA33 antigen restriction fragment length polymorphisms between C57BL/6J and Mus spretus mouse genomic DNAs digested with a number of restriction enzymes. Genetic mapping was achieved using a (C57BL/6J ϫ M. spretus)F 1 ϫ M. spretus backcross (BSS panel 2) generated and distributed by the Jackson Laboratory (Bar Harbor, ME). Genomic DNA from 94 backcross animals was digested with BamHI, EcoRV, PstI, SacI, and ScaI, electrophoresed in 1% agarose gels, and transferred to Hybond Nϩ (Amersham Biosciences). Membranes were hybridized with the mA33 antigen cDNA probe, and the segregation of mA33 antigen alleles was analyzed. Gene linkage and order were determined using the MapManager v2.6.5 program (31).
RNA Preparation and Northern Blot Analysis-Total RNA was prepared from cell lines either by the method of Chomczynski and Sacchi (32) or by using Trizol reagent (Invitrogen) according to the manufacturer's instructions. RNA pellets were dissolved in autoclaved H 2 O treated with 0.1% (v/v) diethylpyrocarbonate (Sigma Chemical Co.). Northern analysis was performed as described previously (18). All images were generated using a PhosphorImager (Molecular Dynamics, Sunnyvale, CA).
Generation of Truncated Versions of hA33 Antigen 5Ј-Flanking Sequence-To identify cis-acting elements of the hA33 antigen gene regulating transcription, a nested series of fragments of the hA33 antigen 5Ј-flanking sequence were amplified by PCR of genomic clone 28A using the HiFi high-fidelity PCR system (Roche Molecular Biochemicals) and the forward and reverse oligonucleotide primers listed in Table I. The 3Ј end of all fragments terminated in exon 1 at a position 36 bp upstream of the initiation of translation. The products were subcloned into the promoterless pGL3 basic vector (Promega, Madison, WI), immediately upstream of the firefly luciferase gene. The orientation and fidelity of all promoter fragments were determined by nucleotide sequencing on both strands.
Transient Transfection and Reporter Assays-The promoter activities of the different constructs were determined in transient transfection assays using the Dual Luciferase reporter system (Promega). Cells in exponential growth phase were seeded into 24-well plates as follows: 2 ϫ 10 5 cells/well for SW1222, LS174T, and KM12SM; 1.75 ϫ 10 5 for LIM1215; 1.25 ϫ 10 5 for SW480; 1.5 ϫ 10 5 for Caco-2 and LIM2099; and 1 ϫ 10 5 for Colo526, U2-OS, and 293T. The next morning, cells (ϳ80% confluent) were co-transfected with 0.5 g of one of the A33 promoterluciferase (A33-luc) constructs and 0.5 g of the pRL-tk (Renilla luciferase-thymidine kinase promoter) vector (Promega) using 2 l of Fu-GENE 6 transfection reagent (Roche Molecular Biochemicals). 48 h later, cells were washed with ice-cold phosphate-buffered saline and lysed using 100 l of passive lysis buffer. Firefly and Renilla luciferase activities of cell lysates (20 l) were determined sequentially using specific substrates in a Dynatech ML3000 luminometer with the gain setting on high. Transfection efficiencies were normalized by reference to the Renilla luciferase activity of the co-transfected pRL-tk vector. Results are expressed as relative firefly luciferase activity, corresponding to the number of light units obtained from cell lysates using the firefly luciferase substrate (substrate for the A33-luc constructs) divided by the number of light units obtained with the Renilla luciferase substrate (substrate for pRL-tk). All values represent the mean Ϯ S.E. of triplicate cultures. All experiments were performed at least three times.
Preparation of Nuclear Proteins-Cells (1 ϫ 10 7 ) were rinsed in phosphate-buffered saline, harvested in 10 ml of ice-cold PBSE (1ϫ phosphate-buffered saline ϩ 1 mM EDTA), and centrifuged. The cell pellet was resuspended in 1 ml of PBSE, re-pelleted, resuspended in 400 l of ice-cold hypotonic buffer A (10 mM Hepes, pH 7.9, 10 mM KCl, 0.1 mM EDTA, 0.1 mM EGTA, 0.5 mM phenylmethylsulfonyl fluoride, 1 mM dithiothreitol, and 1ϫ complete protease inhibitor mixture (Roche Molecular Biochemicals)), and incubated on ice for 15 min. Cells were lysed by the addition of 25 l of Nonidet P-40 detergent (final concentration, 0.6% (v/v)) and vortexed vigorously for 10 s. The nuclei were pelleted by centrifugation, and nuclear protein was extracted by resuspension in 50 l of ice-cold buffer C (20 mM Hepes, pH 7.9, 0.4 M NaCl, 1 mM EDTA, 1 mM EGTA, 1 mM phenylmethylsulfonyl fluoride, 1 mM dithiothreitol, and 1ϫ complete protease inhibitor mixture) and incubated on ice for 15 min. The samples were centrifuged, and the supernatant containing nuclear proteins was collected and stored at Ϫ70°C. Protein concentration was determined using the Bradford Assay (Bio-Rad, Hercules, CA).
Western Blot Analysis-20 g of nuclear protein extract were subjected to Western blot analysis as described previously (18). Human CDX1 was detected after overnight incubation at 4°C with a 1:200 dilution of an affinity-purified rabbit anti-mouse Cdx1 polyclonal antiserum (33), followed by incubation (60 min, 25°C) in horseradish peroxidase-conjugated goat anti-rabbit immunoglobulin G at a 1:10,000 dilution (Zymed Laboratories, South San Francisco, CA). Interactions were visualized by enhanced chemiluminescence (Amersham Biosciences).

RESULTS
Cloning and Characterization of the Human and Mouse A33 Antigen Genes-Both the human and mouse A33 antigen genes comprise seven exons spanning ϳ38 and 35 kb of genomic DNA, respectively (Fig. 2). Both the V-type and C2-type immunoglobulin-like domains of the A33 antigen extracellular domain are encoded by two exons (half domain exons; Fig. 2), as has been shown for several other members of the immunoglobulin superfamily including MUC18 and N-CAM (34,35). Conservation of features (exon/intron organization, phase of splicing, and presence of a novel pair of cysteine residues) among hA33/mA33 antigens, CTX (marker of cortical thymocytes in Xenopus), and CAR (receptor for Coxsackie group B viruses and adenoviruses types 2 and 5) has defined a new family of cell surface proteins within the immunoglobulin superfamily that may have arisen by gene duplication of a common ancestor containing this characteristic gene structure (18,36).
The preliminary chromosomal localization of the gene encoding hA33 antigen was established by Southern blot analysis of genomic DNA from hamster ϫ human somatic cell hybrids (data not shown) and independently confirmed and refined by fluorescence in situ hybridization. The hA33 antigen gene localizes to 1q22-q24 (Fig. 3, A and B). To determine the chromosomal localization of the mA33 antigen gene, Southern analysis of the Jackson Laboratory backcross (BSS panel 2) using a mA33 antigen cDNA probe was performed to identify restriction fragment length polymorphisms between C57BL/6J and M. spretus genomic DNA. Comparison of the mA33 antigen-PstI haplotype distribution demonstrated co-segregation of mA33 antigen with the Pltr9 locus (37, 38) previously mapped to the distal region of mouse chromosome 1 (Fig. 3C). The most likely gene order positions mA33 antigen 1.10 Ϯ 1.09 cM distal to Pmx1 (37, 39) and 8.87 Ϯ 5.57 cM proximal to Apoa2 (Fig.  3D) (37,40). This region is syntenic with the human chromosomal region 1q23-q25, thereby providing compelling evidence that the human and mouse genes cloned in this study are orthologs.
Determination of the Transcription Start Sites of the Human and Mouse A33 Antigen Genes-Our previous experiments (17,18) had revealed that the human and mouse genes contained at least 345 and 102 bp of 5Ј-untranslated region, respectively. To determine the transcription start sites of both genes, we carried out 5Ј-RNA ligase-mediated-RACE-PCR. Our results demonstrated that in human, a major transcription start site does indeed occur 345 bp upstream of the initiation of translation (Table II), corresponding exactly with the 5Ј end of the longest cDNA clone (clone 18) (17). In addition, a number of other more downstream, transcription start sites were also indicated by our analysis (Table II). These are likely to represent true minor transcription start sites because we conducted the reverse transcription reaction at 65°C, thereby avoiding a technical limitation of the method employed (41). In mouse, we also obtained evidence for multiple start sites (Table II), with the major and most 5Ј start site corresponding to a position 153 bp upstream of the initiation of translation, extending by 51 bp the 5Ј end we had previously obtained using conventional 5Ј-RACE-PCR (18). These data suggest that the transcription start sites in the human and mouse genes do not correspond precisely.
Cloning and Sequencing of the 5Ј-Flanking Sequence of the mA33 Antigen Gene-Given the extremely similar expression patterns of the hA33 and mA33 antigens in vivo, it might be expected that the regulatory sequences critical for controlling tissue-specific expression would be conserved between the human and mouse promoters, as had previously been observed for the sucrase isomaltase (SI) promoter (42). The hA33 antigen promoter sequence was compared with the corresponding sequence in the mA33 antigen gene (Fig. 4) and found to be 65% identical, corresponding closely with the level of identity (67%) between the coding sequences in the two genes. There are two canonical TATA (TATAAA) box sequences in the human gene that lie 12 and 224 nucleotides upstream from the most 5Ј transcription start site, but these are not conserved in mouse. Furthermore, because neither TATA box lies in a region 25-30 bp upstream of the start site, we infer that the initiation of transcription is not dependent on TATA sequences. However, both genes contain a conserved sequence, TCAGTTA (shown in brackets, Fig. 4), which closely corresponds to the consensus transcription initiator (Inr) sequence (Py Py A ϩ1 N T/A Py Py) described by Lo and Smale (43), and a closely spaced conserved sequence GGACTTTG (dashed box, Fig. 4), which corresponds to a consensus downstream promoter element (44). The configuration of these elements is consistent with them playing a role in the initiation of transcription.
The promoter region alignment revealed several conserved binding sites for transcription factors previously implicated in the regulation of gene expression in intestinal epithelial cells (Fig. 4). A potential binding site for the gut and intestine enriched Krü ppel-like factors (GKLF/KLF4 and IKLF/KLF5) (23,45) and two putative CDX-binding sites (24) were perfectly conserved in the human and mouse A33 antigen genes (Fig. 4).
Characterization  Fig. 5A. The longest fragment tested (Ϫ2.3 kb) exhibited substantial promoter activity (black bars) in the two hA33 antigen(ϩ) CRC-derived cell lines examined (SW1222 and LS174T) but was inactive in the hA33 antigen(Ϫ) CRC-derived cell line, SW480, and the three hA33 antigen(Ϫ) cell lines derived from nonintestinal tissues (U2-OS, 293T, and Colo526; Fig. 5B). All six cell lines supported the activity of the SV40 promoter and enhancer (Fig. 5B). This result suggested that all the cis-acting elements required to recapitulate endogenous hA33 antigen transcription in vitro may be contained in the Ϫ2.3-kb region.
The promoter activities of a series of nested fragment hA33luciferase constructs (Ϫ2.3, Ϫ1.2, Ϫ0.6, and Ϫ0.44 kb) were then examined (Fig. 5C). All four constructs assayed were active in the four hA33 antigen(ϩ) CRC cell lines (SW1222, LS174T, LIM1215, and KM12SM) and inactive in the four hA33 antigen(Ϫ) cell lines (SW480, LIM2099, Caco-2, and Colo526). Over a series of experiments (n ϭ 4), there were no reproducible significant differences in activity among the four constructs within any individual cell line. Compared with the Ϫ2.3-kb construct, the Ϫ1.2-kb construct lacked a region of 5Ј-flanking sequence containing the consensus HNF1-and c-MYB-binding sites (Fig. 5A). These data imply that the HNF1 and c-MYB sites do not greatly influence transcription of the hA33 antigen gene and do not play a significant role in the differential hA33 antigen expression in CRC derived-cell lines.
The identification of putative cis-acting elements was refined further by analyzing the activities of a final series of deletion constructs. The Ϫ0.44-, Ϫ0.42-, Ϫ0.41-, Ϫ0.38-, and Ϫ0.36-kb constructs were all active in hA33 antigen(ϩ) cell lines (SW1222 and LS174T) and inactive in the hA33 antigen(Ϫ) cell lines (SW480 and Caco-2; Fig. 5D). Deletion of 21 bp from the Ϫ0.44-kb construct (yielding a Ϫ0.42-kb construct), removed the potential GKLF/IKLF-binding site (45) and resulted in a consistently reproducible 3-fold decrease in reporter gene activity in SW1222 cells and a consistently reproducible 2-fold decrease in reporter gene activity in LS174T cells (Fig. 5D). Deletion of an additional 17 bp from the Ϫ0.42-kb construct (yielding the Ϫ0.41-kb construct) removed the more distal of two consensus binding sites for CDX1/CDX2 (24) and resulted in a further consistent and reproducible 2-fold decrease in activity in SW1222 and LS174T cells. Removal of 21 bp from the Ϫ0.41-kb construct (yielding the Ϫ0.38-kb construct), a region that lacks potential transcription factor-binding sites, did not produce an appreciable change in reporter gene activity in either SW1222 or LS174T cells. Deletion of an additional 21 bp from the Ϫ0.38-kb construct (yielding the Ϫ0.36-kb construct), resulting in removal of the more proximal consensus CDX1/CDX2-binding site, produced an ϳ6-fold decrease in reporter gene activity in SW1222 cells and an ϳ4-fold decrease in reporter gene activity in LS174T cells. Taken together, these data strongly implicate the consensus GKLF/KLF4-binding site and both the distal and proximal consensus CDX-binding sites in regulating the transcriptional activity of the hA33 antigen gene in CRC-derived cells in vitro. Finally, deletion of 141 bp from the Ϫ0.36-kb construct (producing the Ϫ0.22-kb construct) removed the major, most 5Ј transcription start site of the hA33 antigen gene, resulting in negligible activity in the hA33 antigen(ϩ) cells (Fig. 5D).
CDX1 Binds to a cis-Regulatory Element in the A33 Antigen Gene-To determine whether CDX1 could interact with con-

TABLE II
Positions of transcription start sites in human and mouse A33 antigen genes Transcription start sites were determined using an RNA ligation-dependent 5Ј-RACE-PCR method as described under "Experimental Procedures." All positions are indicated relative to the initiation of translation, where A in ATG is designated ϩ1. sensus CDX-binding sites contained within the putative cisregulatory region of the hA33 antigen gene promoter, we performed electrophoretic mobility shift assays using nuclear extracts from transfected 293T cells. Using a probe encompassing the more proximal of the two CDX-binding sites, a prominent retarded complex was produced by nuclear extracts from 293T cells transfected with mouse Cdx1 (Fig. 7, lane 5). This band was absent when the extract was incubated with a mutated probe (Fig. 7, lane 6), and its intensity was markedly diminished when a 25-fold excess of unlabeled probe was added (Fig. 7, lane 7). When an antibody raised against mouse Cdx1 was included in the incubation, the intensity of the band corresponding to the putative CDX1-DNA complex was markedly diminished, and a proportion was supershifted (Fig. 7, lane 8). This analysis strongly suggests that CDX1 binds specifically to the regulatory region in the A33 antigen gene defined here. DISCUSSION We undertook a comprehensive characterization of the human A33 antigen gene to shed light on the mechanisms regulating its intestine-specific expression pattern. We had demonstrated previously that human CRC-derived cell lines may be either positive (ϩ) or negative (Ϫ) for hA33 antigen expression (17) (Fig. 6C), even though normal intestinal epithelium and most primary CRC tumors are strongly positive. In our study, we assumed that the group of six cell lines that express the A33 antigen (SW1222, LS174T, LIM1215, LIM1899, KM12SM, and LIM1863) had retained expression of all the molecules that normally play a role in regulating A33 antigen expression in vivo, whereas the non-A33 antigen-expressing cell lines (Caco-2, LIM2099, LIM2405, HCT116, SW480, and LIM2537) had not. Accordingly, such cell lines provided us with a suitable model system with which to analyze the highly restricted ac- FIG. 4. Alignment of sequences upstream of the initiation of translation in the human and mouse A33 antigen genes. The sequences were aligned with the MegAlign program (DNAStar Inc.) using the Clustal algorithm with a PAM250 matrix followed by manual adjustment. The initiation of translation codons (ATG) are in open boxes, and the individual nucleotides corresponding to the most 5Ј transcription start sites we detected for each of the two genes are in black boxes. Conserved nucleotides are shaded gray. The hA33 and mA33 antigen gene promoters share 65% nucleotide sequence identity over the region shown. The hA33 antigen gene contains a 25-bp tandem repeat, indicated by the forward and reverse dotted arrows between positions Ϫ198 and Ϫ149 (relative to the initiation of translation), that is not conserved in mouse. The 5Ј ends of the nested fragments of the hA33 antigen gene promoter are also indicated with forward arrows. All terminate with the same 3Ј end, indicated by a reverse arrow after position Ϫ36. The initiator (Inr) and downstream promoter elements that may contribute to the initiation of transcription are indicated by the bracketed area and the dashed box, respectively, between the Ϫ0.36and Ϫ0.22-kb constructs. The conserved consensus gut/intestine-enriched Krü ppel-like factor-binding site located between the beginning of the Ϫ0.44and Ϫ0.42-kb constructs is highlighted in black. The two conserved consensus CDX-binding sites located between the Ϫ0.42and Ϫ0.36-kb constructs are boxed. tivity of the hA33 antigen promoter, and we believe that our in vitro findings may be highly relevant to the regulation of A33 antigen expression in intestinal epithelial cells in vivo. The results of our dissection of the hA33 antigen promoter demonstrate that ϳ400 bp of the 5Ј-flanking sequence (Ϫ443 to Ϫ36 by reference to the translation start site) are capable of recapitulating endogenous hA33 antigen gene expression in CRCderived cells. Further refinement of this region points to a consensus GKLF/IKLF-binding site and two consensus CDX1/ 2-binding sites as important positive cis-acting elements in the hA33 antigen gene. The perfect conservation of these elements in the mA33 antigen gene adds weight to this concept, and data generated by electrophoretic mobility shift assay demonstrated that mCdx1 was capable of binding to the proximal CDX1/2binding site in the regulatory region. Moreover, our analysis of mRNA expression of the relevant transcription factors in 10 CRC cell lines suggested that CDX1 may be absolutely required for expression of the hA33 antigen gene, at least in vitro. Furthermore, the results we obtained with SW480 cells, which express mRNAs encoding CDX2, GKLF/KLF4, and IKLF/KLF5 but not CDX1, indicated that these three transcription factors, acting alone or in combination, are insufficient to drive hA33 antigen reporter gene expression in vitro.
The clustering of critical regulatory elements within 100 bp 5Ј of the major transcription start site in the hA33 antigen gene suggests that this region may comprise the core promoter of the TATA-less hA33 antigen gene. Inspection of CDX-binding sites in many other known target genes expressed by intestinal epithelium reveals very similar findings. Such genes include SI (48,49), lactase-phlorizin hydrolase (LPH) (9)(10)(11), guanylyl cyclase C (GC-C) (12,13), claudin-2 (26), CA1 (50,51), and proglucagon (52). In all cases, the promoters contain CDXbinding sites within a region 25-110 bp upstream of the transcription start site. Therefore, our studies, along with those of others, reinforce the notion that CDX transcription factors play a pivotal role in determining the expression patterns of many intestine-specific genes.
Comparing the expression patterns of the mA33 antigen and Cdx1 genes in vivo reveals that mA33 antigen and Cdx1 expression are closely coordinated during intestinal development. Expression of Cdx1 first appears in the distal hindgut endoderm at E13.5, ϳ24 h before the onset of mA33 antigen expression in exactly the same region (53,54). Meanwhile, Cdx2 expression is found exclusively in the proximal midgut endoderm at E13.5 (53). The Cdx1 expression domain then moves proximally and is distributed universally throughout antigen promoter constructs (A33-luc) used to study regulation of the A33 antigen gene. Regions of the 5Ј-flanking sequence of the human A33 antigen gene were amplified by PCR and subcloned into the promoterless vector, pGL3basic, upstream of the firefly luciferase reporter gene. All hA33 antigen sequences terminated at the same 3Ј end at a position 36 bp upstream of the ATG. The arrow denotes the most 5Ј transcription start site in the hA33 antigen gene. The shaded boxes indicate transcription factor-binding sites for transcription factors previously shown to play a role in regulating genes expressed in intestinal epithelium. B, the 2.3-kb construct drives expression of firefly luciferase in two hA33 antigen(ϩ) CRC cell lines, SW1222 and LS174T (f), but not in four hA33 antigen(Ϫ) cell lines (SW480, U2-OS, 293T, and Colo526). Relative firefly luciferase activity, shown along the horizontal axis, was corrected for transfection efficiency by reference to the activity of Renilla luciferase driven by the thymidine kinase promoter in a co-transfected plasmid. Each bar represents the mean Ϯ S.E. of three independent transfections. All cell lines supported firefly luciferase expression driven by the SV40 promoter and enhancer (Ⅺ) but gave negligible activity with the promoterless pGL3basic plasmid (data not shown). C, ability of nested fragments of the 5Ј-flanking sequence of the hA33 antigen gene to drive expression of firefly luciferase was restricted to four hA33 antigen(ϩ) CRC cell lines (SW1222, LS174T, LIM1215, and KM12SM). Each bar represents the mean Ϯ S.E. of three independent transfections. Over a number of experiments (n ϭ 4), the relative luciferase activities were approximately equal for each of the Ϫ2. the gut by E14.5 (53). The observation that the expression of the mA33 antigen precisely recapitulates the expansion of the Cdx1 expression domain exactly 24 h later (54) is entirely consistent with a role for Cdx1 in inducing mA33 antigen expression during intestinal development. Meanwhile, the genes encoding the pancreatitis-associated protein 1 (25) and claudin-2 (26) are the only other putative Cdx1 target genes identified to date, and their expression patterns closely resemble that of Cdx1 in intestinal epithelium in vivo (26,55). In contrast, genes expressed specifically by the differentiated cells in the small intestine (SI, LPH, and GC-C) and colon (CA1) are targets for CDX2 in vitro, and their expression patterns are almost superimposable with that of CDX2 in vivo (Fig. 1) (8 -12, 50, 51, 56).
In the adult, both the A33 antigen and CDX1 are co-expressed in the proliferative cells in intestinal crypts (16,33); however, A33 antigen expression is also seen in the differentiated cells. This could result from induction of A33 antigen expression by CDX1 in cells at the base of the crypts, and maintenance of A33 antigen expression during cell migration and differentiation via mRNA and/or protein stabilization. This question could be further addressed by determining the localization of mA33 antigen mRNA in vivo using in situ hybridization.
The temporospatial expression pattern of GKLF/KLF4 and IKLF/KLF5 during intestinal development through stages E10.5 to E15.5 has not been studied in detail. However, expression of IKLF/KLF5 in the adult intestine is reminiscent of that of CDX1, being found at its highest level in the lower crypt region (47). Meanwhile, expression of GKLF/KLF4 more closely mimics that of CDX2, being expressed by the differentiated cells in the small intestine and colon, where it has been implicated in regulating exit from the cell cycle by direct repression of cyclin D1 promoter activity (57). It is conceivable therefore that CDX1 and IKLF/KLF5 together regulate gene expression in the lower, proliferative compartment, whereas CDX2 and GKLF/KLF4 produce augmented patterns of gene expression in Finally, our results suggest that a small region (400 bp) of the promoter of the A33 antigen gene is likely to be sufficient to produce intestine-specific patterns of gene expression in transgenic mice. Whereas this idea has yet to be tested, the general utility of the mA33 antigen gene locus in driving uniform expression in the intestinal epithelium was recently demonstrated in a proof of principle experiment. A "knock-in" mouse model was created whereby a cassette encoding an internal ribosome re-entry site and a truncated, oncogenic form of ␤-catenin (⌬N␤-cat) was inserted into the mA33 antigen gene between the stop codon and 3Ј-untranslated region by homologous recombination (58). Mice homozygous for the targeted transgene expressed ⌬N␤-cat and developed a higher frequency of both spontaneous and chemically induced aberrant crypt foci and intestinal adenomas than their wild-type counterparts (58).