Mycobacterium bovis Bacillus Calmette-Guerin and Its Cell Wall Complex Induce a Novel Lysosomal Membrane Protein, SIMPLE, That Bridges the Missing Link between Lipopolysaccharide and p53-inducible Gene, LITAF ( PIG7 ), and Estrogen-inducible Gene, EET-1 *

,

Fighting off microbial infection has been one of the oldest wars waged by man. In order to develop more potent vaccines and therapeutic agents, it has become essential to understand the mechanism of pathogenesis at the point of host-microbial interaction. Despite the identification of a number of host genes involved in protective responses and those exploited by the microbe for pathogenesis, the evolving nature of pathogens and ranges of their variant and invariant components offer a virtually unlimited challenge. We focused on a microbial component derived from the cell wall complex of Mycobacterium bovis Bacillus Calmette-Guerin (BCG), 1 a non-pathogenic vaccine strain of tuberculosis. The cell wall complex, which we call BCG-CWS, is a highly purified fraction with immunotherapeutic potential (1,2) and is mainly composed of peptidoglycan, arabinogalactan, and mycolic acid. The conserved microbial structures that are relatively invariant within a class of microorganisms such as LPS, peptidoglycan, lipoteichoic acid, and lipopeptides are recognized by Toll-like receptors (TLR) leading to the induction of immune and inflammatory genes (3,4). It has been reported recently from our laboratory that BCG-CWS is recognized by both TLR2 and TLR4, and it effectively induces dendritic cell (DC) maturation (5) through TNF-␣ production, which is comparable to the heat-killed or live BCG-induced maturation profile. These findings suggest that the component can be used to explore other potential genes regulated downstream of the signaling cascade, including those genes that could also possibly change during live mycobacterial infection. In a combined effort of mRNA differential display, suppression subtractive hybridization, and cDNA array analysis, several differentially expressed genes have been reported (6,7). We mainly focused on identifying those gene fragments that are segments of new genes or those that are not yet defined as a part of known genes, with an aim of generating a comparative expression profile for a set of PAMPs (8,9). During our study, one of these genes appeared to be significantly similar to the previously identified human genes, LITAF (10) and PIG7 (11). LITAF is a novel protein binding to a critical region of human * This work was supported in part by Organization for Pharmaceutical Safety and Research, grant-in-aid from the Ministry of Education, Science, and Culture, and the Ministry of Public Welfare of Japan. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
The nucleotide (sequence(s) reported in this paper has been submitted to the DDBJ/GenBank TM /EBI Data Bank with accession number(s) AB034747.
¶ Supported by Japan Science and Technology Corp. fellowship. ʈ To whom correspondence should be addressed: Dept. of Immunology, Osaka Medical Center for Cancer and Cardiovascular Diseases, Higashinari-ku, Osaka 537 Japan. Tel./Fax: 81 6 6973 1209; E-mail: tseya@mail.mc.pref.osaka.jp. TNF-␣ promoter and is reported to be involved in TNF-␣ expression during LPS induction (10,12). The same gene was found to be induced severalfold during p53-mediated apoptosis (11,13) and was described as PIG7. Our findings raise the possibility that the gene we found to be induced by BCG-CWS and named SIMPLE, which is also a human homologue of rat EET-1 (14), could actually be the LITAF/PIG7. Thus, the induction of this gene in various contexts is intriguing and may not be simply a coincidence; here we describe the detailed analysis of the SIMPLE transcript, including its expression, localization, genomic organization, and its assignment under a new family.

Preparation and Treatment of Monocytes with BCG-CWS and BCG-
Peripheral blood mononuclear cells were isolated by standard density centrifugation with Ficoll-Paque (Amersham Pharmacia Biotech) from 400 ml of citrate phosphate dextrose-supplemented human blood. CD14ϩ monocytes were separated from peripheral blood mononuclear cells by anti-CD14-coated microbeads and a magnetic cell sorting column (Miltenyi Biotec GmBH). Cells were cultured overnight in 10-cm dishes in the presence of RPMI supplemented with 10% FBS and 2% human AB serum. The next day cells were treated for 8 h with 15 g/ml BCG-CWS prepared in emulsion buffer (PBS containing 1% Drakeol 6VR and 1% Tween 80). Cells in the control plates were treated with 15 l/ml emulsion buffer for the same time. For heat-killed and live BCG treatment M. bovis BCG Tokyo strain has been used at a concentration of 1 bacillus/monocyte for 8 h. In some experiments immature dendritic cells (iDC) have been used to stimulate with BCG-CWS, and the preparation of iDC was essentially the same as described previously (5).
Differential Display RT-PCR-Total RNAs from BCG-CWS-stimulated and unstimulated human monocytes were isolated using TRIZOL (Life Technologies, Inc.) reagent according to the manufacturer's instructions. Two micrograms of total RNA from control and from stimulated cells were reverse-transcribed using Superscript RT (Life Technologies, Inc.) by 12 types of T12MN (M ϭ A/C/G; N ϭ A/T/C/G) primers. Arbitrary primer (AP) sets were selected from RNAmap Kit I and II (Gene Hunter Corp., Nashville, TN), and PCR amplification was performed using Takara PCR kit components (Tokyo, Japan) and [ 32 P]dCTP (Daiichi Pure Chemicals, Tokyo, Japan). PCR conditions, gel run parameters, fragment elution, and further amplification of PCR products were described previously (15). Fragments extracted from the gel were sequenced after TA cloning (Invitrogen, Carlsbad, CA) in an ABI-373 Sequencer (PE Applied Biosystems, Foster City, CA) using Dye terminator ABI sequencing Kit (PE Applied Biosystems).
Northern Blot Analysis-Total RNAs from unstimulated and stimulated monocytes and other cells were isolated, and at least 10 g of total RNA/lane was loaded on 1% formamide gel. RNA was transferred onto Hybond Nϩ membranes (Amersham Pharmacia Biotech); membranes were prehybridized for 30 min and hybridized for 1 h at 65°C in rapid hybridization buffer (Qiagen, Valencia, CA) in the presence of appropriate 32 P-labeled cDNA probe (SIMPLE ORF, GCAP2, PCR-amplified human and murine TNF-␣ ORF regions, and EST as murine SIMPLE/ TBX1). Blots were washed at high stringency (65°C, 0.2ϫ SSC, 0.1% SDS) and exposed to the films for different times. The mRNA expression level was estimated from radioactivity of the hybridized probe by phosphorimaging or autoradiography.
Purification of Escherichia coli-expressed SIMPLE and Immunization-The open reading frame of SIMPLE cDNA was amplified by 5Ј-gga tcc atg tcg gtt cca gga cct ta-3Ј with BamHI site (bold) and 5Ј-aag ctt cta caa acg ctt gta ggt gc-3Ј with HindIII site (bold) and ligated into the E. coli expression vector pQE-30 (Qiagen) in frame with N-terminal His tag and used to transform competent E. coli M15[pREP4] cells (Qiagen). N-terminal (His) 6 -tagged SIMPLE was purified under denaturing condition by nickel-nitrilotriacetic acid-agarose (Qiagen) chromatography according to the manufacturer's instructions. Polyclonal antibody against purified SIMPLE was generated by immunizing a rabbit following standard methods.
Constructs and Transfection-The coding region of SIMPLE (234 -719 bp) was cloned into two mammalian expression vectors pEGFP-C1 (CLONTECH) and pcDNA3 (Invitrogen), respectively. The primer combinations used for cloning into XhoI and HindIII sites of pEGFP-C1 were 5Ј-ctc gag cca cca tgt cgg ttc cag gac ctt a-3Ј and 5Ј-aag ctt cta caa acg ctt gta ggt gc-3Ј. Primers containing BamHI and EcoRI sites, 5Ј-gga tcc cca cca tgt cgg ttc cag gac ctt a-3Ј and 5Ј-gaa ttc cta caa acg ctt gta ggt gc-3Ј, respectively, were used for cloning into pcDNA3. All DNA constructs were checked by sequencing and transfected into various mammalian cell lines (HeLa, COS-7, and RK13) with Lipofectin reagent (Life Technologies, Inc.).
Immunoblotting-For detection of SIMPLE protein, SIMPLE-transfected RK13 cells, human monocytes, HeLa, THP-1 and DLD-1 cells were lysed in cell lysis buffer containing 1% Nonidet P-40, 10 mM EDTA, 140 mM NaCl, 20 mM Tris-HCl, pH 7.4, 1.0 mM phenylmethylsulfonyl fluoride, and 5 mM iodoacetamide. The cell lysates were solubilized either in reducing or non-reducing sample buffer and resolved in a 12.5% SDS-PAGE and transferred onto polyvinylidene difluoride membranes (Millipore, Bedford, MA). The blots were blocked with 10% non-fat milk and treated with rabbit anti-SIMPLE polyclonal antibody at a dilution of 1:5000. After washing, the blots were incubated with horseradish peroxidase-conjugated goat anti-rabbit IgG (Bio-Rad) and developed with ECL (Amersham Pharmacia Biotech).
Subcellular Fractionation-THP-1 cells were harvested by centrifugation at 1000 ϫ g and then washed 3 times with PBS. Cell pellets were washed once with hypotonic buffer (10 mM KCl, 20 mM Hepes, pH 7.4, 1.5 mM MgCl 2 , 0.1 mM EDTA) and pelleted at 1000 ϫ g for 10 min. The resulting cell pellet were then resuspended in hypotonic buffer and placed on ice for 20 min. Lysis of the cell suspension was accomplished with 50 strokes in a Dounce homogenizer and sequentially centrifuged at 1000 ϫ g to pellet nuclei, 8000 ϫ g to pellet mitochondria and lysosome, and 100,000 ϫ g to pellet endoplasmic reticulum, Golgi, and membranes. The supernatant after the last centrifugation was kept as the cytosolic fraction. The pellets, resuspended in homogenizing buffer, and the cytosolic fraction were analyzed directly by immunoblotting using rabbit anti-SIMPLE antibody or mouse anti-LAMP-1 antibody (PharMingen, San Diego, CA).
Triton X-114 Extraction and Deglycosylation Assay-THP-1 cells (ϳ5 ϫ 10 6 cells) were extracted in 0.5 ml of 0.5% Triton X-114 in PBS for 1 h on ice, and the nuclei and cell debris were removed by centrifugation at 1800 ϫ g for 10 min at 4°C. The proteins were then partitioned into the detergent and aqueous phases according to the method of Bordier (35). The separated detergent and aqueous phases were analyzed directly by immunoblotting. THP-1 cells or RK13 cells expressing SIMPLE were solubilized in 20 mM Tris maleate buffer, pH 6.0, containing 1% Nonidet P-40, 0.1% SDS, 1.0 mM phenylmethylsulfonyl fluoride, and 5 mM iodoacetamide. To remove the debris, solubilized sample was centrifuged at 10,000 ϫ g for 30 min at 4°C. After centrifugation, the supernatant was incubated with 100 microunits of neuraminidase (Sigma) for 1 h at 37°C followed by 4.5 milliunits of O-glycanase (endo-␣-N-acetylgalactosaminidase; Genzyme, Cambridge, MA) treatment for 16 h at 37°C.
Confocal Microscopy-HeLa cells incubated in 4-well glass slides (Nunc Inc., Naperville, IL) were fixed for 30 min with 1% paraformaldehyde in PBS and were permeabilized with 0.5% saponin, 1% BSA/ PBS for 30 min, washed four times with PBS. After soaking in 1% BSA/PBS, the cells were treated for 1 h at room temperature with rabbit anti-SIMPLE polyclonal antibody or pre-immune rabbit antiserum in 1% BSA/PBS at a dilution of 1:500. The cells were washed with 1% BSA/PBS and treated for 30 min with fluorescein isothiocyanateconjugated goat anti-rabbit IgG (1:100) (Organon Teknika Corp., West Chester, PA) diluted in 1% BSA/PBS. Cells were washed 4 times by 1% BSA/PBS and incubated for 1 h with mouse anti-LAMP-1 antibody (5 g/ml) in 1% BSA/PBS, washed again, and incubated for 30 min with rhodamine B-conjugated goat anti-mouse IgG (Organon Teknika Corp.) (1:100). Visualization of acid pH compartments was performed by staining cells with 75 nM Lysotracker Red DND-99 (Molecular Probes, Leiden, The Netherlands) for 1 h at 37°C. Single-or double-stained cells were examined with a confocal laser scanning microscope (Olympus FLUOVIEW). Colocalized green (fluorescein isothiocyanate) fluorescence and red (Lysotracker or Rhodamin-B) fluorescence appeared yellow in the merged images.

BCG-CWS-induced Differential
Display RT-PCR Product, GCAP2, Defines the 3Ј End of LITAF/PIG7-Differential display RT-PCR has been performed between unstimulated and BCG-CWS-stimulated monocyte RNA. A combination of pairs of primers, T12GC (3Ј anchor) and AP2 (5Ј arbitrary), produced a distinct band designated as GCAP2 in BCG-CWS induced RNA (Fig. 1A). Amplification and sequencing of this band identified a 373-bp fragment having identity, with a few acceptable mismatches, to the paired primers at the ends (Fig. 2, boxed sequence). This fragment did not show any homology with any known genes in the data base; however, we found several human ESTs. Furthermore, the fragment when used as a probe showed a distinct signal on Northern blots, suggesting the differential display product belongs to a 2.4-kb transcript (Fig.  4). Next, we employed the in silico cloning strategy (16,17) to generate an extended virtual contig with a group of EST clones that were directly, or through another overlapping EST, linked to the GCAP2 sequence. This procedure identified a human EST (AW022014) that bridged between the GCAP2 fragment and ESTs denoted by the LITAF/PIG7-end sequence. Sets of EST contigs for this zone are also currently available under a TIGR transcript (THC116999, October, 2000) at NCBI. In order to verify this link, RT-PCRs with 5Ј primers from LITAF and 3Ј primers from the GCAP2 fragment and 5Ј-rapid amplification of cDNA ends from GCAP2 were conducted. Various combinations of RT-PCR primer pairs yielded the expected size bands that were further verified by sequencing and confirmed the link. A representative result of RT-PCR is in Fig. 1B. The newly identified 3Ј region exhibited a polyadenylation signal (Fig. 2) located 18 bp upstream to the beginning of the poly(A) tail shown. No such polyadenylation signal has been detected in the published LITAF sequence or in the deposited PIG7 sequence. The poly(A) tail consisting of 19 A residues described for LITAF does not seem to be the true poly(A) tail, rather the sequence corresponds to a stretch of 19 As (1754 -1777 bp) interrupted by a G in the SIMPLE cDNA (Fig. 2). The presence of the same A stretch in both cDNA and in genomic sequences (exon 4, Fig.  3C) further emphasizes that the long A-tailing starting at 1755 bp in the LITAF sequence is an intra cDNA region, whereas the poly(A) tail shown for SIMPLE was not derived from the ge-nome. Thus, the GCAP2 fragment defines the 3Ј end of the LITAF/PIG7 transcript, which in turn leads us to conclude that LITAF/PIG7 is a differentially expressed gene in BCG-CWStreated human monocytes.
Identification and Verification of a Translational Frameshift in LITAF/PIG7-During sequencing of several RT-PCRs and 5Ј-rapid amplification of cDNA ends products from human monocyte RNA, we consistently noticed a single G was missing from a stretch of 5 consecutive guanine residues present in the coding region of LITAF (608 -612 bp; GenBank/EMBL/DDBJ accession number NM_004862 or U77396)/PIG7 (454 -458 bp; GenBank/EMBL/DDBJ accession number AF010312). Absence of this G residue created a translational frameshift (Fig. 3B) yielding a protein of 161 amino acids, whereas LITAF/PIG7 encoded a protein of 228 amino acids. This raised a potential question whether there were two types of transcripts producing two different proteins. In order to verify this further, first we used Pfu polymerase to amplify the coding region from various RNA sources that included monocytes from different donors' blood, THP-1 monocytic cells (from which LITAF has been isolated), and DLD-1 colon carcinoma cells (from which PIG7 has been cloned). After cloning the PCR products in each case, we sequenced the critical region for 10 -12 independent clones and detected the stretch of 4 Gs (Fig. 3A, top) instead of 5 Gs in all of them. Second, we looked for ESTs specific for that region because many tissues express the gene, and if the zone is polymorphic that could be represented by some of the ESTs. We found many ESTs ( Fig. 3A) with 4 Gs but could not detect any human or murine EST showing 5 Gs for that particular region, neither in the EST data base nor in the pooled EST set for LITAF/PIG7 under UniGene collection (Hs.76507, October, 2000). Third, we wanted to verify the region from the genomic sequence. For this purpose, we utilized the information from the unfinished genome sequencing project through the HTG data base at NCBI. We partially succeeded, but a few contigs remain yet to be aligned, in identifying the genomic boundaries ( Fig. 3C) for the entire SIMPLE transcript from a chromosome 16 clone, RP11-547D14 (accession number AC007616.3). The astonishing observation was that the region of dispute, 4 Gs or 5 Gs, falls into a potential splice junction (Fig. 3, B and C). According to the splice junction donor acceptor rule (AG/GT), it is clearly in favor of the presence of 4 Gs in the cDNA. Finally, we focused on the deduced amino acid sequence of the transcript. If the protein with the changed C-terminal amino acid truly represented a conserved protein, then we might find other homologues and orthologues in the data base. Due to the advantage of the small size of SIMPLE, we easily could find full-length or near full-length proteins from various species described below (Fig. 9). The C-terminal domain was found to be the most conserved region in a number of proteins from different species, indicating the evolutionarily conserved feature of this domain. On the other hand, we failed to generate any such data for the C-terminal domain specific for LITAF/ PIG7. Based on the above four lines of evidence, we conclude that the 161-amino acid sequence coding the SIMPLE gene is the natural and the most abundant transcript. For clarity, from this point, the 161-amino acid-coded 2368-bp transcript will be referred as SIMPLE. For comparison the first base number of SIMPLE cDNA (Fig. 2) has been kept the same as LITAF.
SIMPLE Is a Widely but Variably Expressed Transcript-Multiple tissue Northern blots from CLONTECH were independently hybridized with full-length cDNA, the coding region, and with the GCAP2 fragment. Three types of hybridizations successively detected a single transcript of the same size on multiple tissues. The result obtained by hybridizing with the coding region of SIMPLE is presented in Fig. 4. Most of the FIG. 1. A, identification of a differentially expressed cDNA, GCAP2, in BCG-CWS-treated monocytes. Total RNA obtained from human monocytes treated with BCG-CWS or vehicle control (emulsion buffer) for 8 h was subjected to differential display RT-PCR using 3Ј anchor primer (5Ј-T12GC-3Ј) and 5Ј arbitrary primer AP2 (5Ј-GACCGCTTGT-3Ј). The PCR products were separated on standard sequencing gels and visualized by autoradiography. The arrow indicates a differentially expressed band of 343 bp (GCAP2) in BCG-CWS-RNA lane. B, human EST, AW022014, identifies the link between LITAF/PIG7 and GCAP2. Schematic presentation at the top shows the RT-PCR strategy to confirm the link between GCAP2 and the 3Ј end of LITAF/PIG7. The result of the RT-PCR using total RNA from BCG-CWS-treated monocytes has been shown below the diagram. The sequences of primer F and primer R are (5Ј-gggccttcctcagcaccatc-3Ј) and (5Ј-gaatcctggcttgctgcttg-3Ј), respectively. The expected amplified product of 1790 bp is indicated by arrow for lane 2; control lane represents an RT-PCR performed in the absence of reverse transcriptase. human tissues except testis expressed SIMPLE abundantly, and this was also consistent with the fact that there was a vast collection of human ESTs (UniGene Hs.76507, October, 2000) from various organs and a single EST (AA62566) of testis origin. In respect to the RNA size marker provided on the multiple tissue Northern blot, the message size fits reasonably with the full-length SIMPLE (ϳ2.4 kb). The message size appeared to be the same in a number of human cell lines, and in 12 paired tumor-normal colon carcinoma samples (data not shown), suggesting there is no aberrant or variant transcript that could be detectable by size difference on Northern blot. The tissue distribution pattern also suggests SIMPLE probably has a more generalized function rather than playing a unique role for the sole benefit of the monocyte/macrophage lineage. However, there is a great variability in the relative expression level among the tissues tested; peripheral blood lymphocyte showed the highest level of expression: little in brain and, compared with ovary, little in testis. Inter-library comparative profile of the SAGE-tag, TGAATACTAC (Fig. 2), indicates that the expression could be regulated by a hormone in the human reproductive tissues: as evident in LNCaP with DHT versus LNCaP without 5␣-dihydrotestosterone or MCF7 3 h versus MCF7-estradiol 3 h. The SAGE data further demonstrate that the transcript might be differentially expressed in certain malignancies, such as prostate, breast, and ovary.
Expression of SIMPLE Is Induced by BCG-CWS and BCG-Often differential display fragments gives rise to false positives, so it was essential to reverify the differential expression in response to stimulation. In different batches of monocytes and iDC, BCG-CWS enhanced the expression of SIMPLE about 2-fold at 8 h of induction (Fig. 5A), the same incubation time for the differential display study. The expression has also been checked in response to heat-killed and live M. bovis BCG; both are found to be potent inducers of this gene (Fig. 5B) at 8 h of infection. M. bovis BCG doubling time in monocytes was ϳ20 h, suggesting the induction was replication-independent. However, a detailed study is required to confirm whether the expression was phagocytosis-dependent or the expression undergoes alteration with bacilli growth.
SIMPLE Versus TNF-␣ Induction in Human and Murine Monocytes-BCG-CWS and BCG are both potent inducers of TNF-␣, and LITAF has been described as a novel regulator of FIG. 2. The 2.4-kb SIMPLE cDNA (GenBank TM AB034747) and deduced amino acid sequence. The cDNA region (GCAP2) identified by differential display is shown within a box. The large shaded region represents the connecting sequence between LITAF and GCAP2, which also overlaps with the EST AW022014 sequence. At and near the end of 3Ј-untranslated region, the poly(A) tail and polyadenylation signal are indicated in uppercase, boldface letters. The poly(A) stretch, which corresponds to the 3Ј end of the LITAF sequence, is shown in lowercase, boldface letters. Underlined sequences in the 3Ј-untranslated region represent SAGE tags, and the vertical bars correspond to the potential exonic junctions. In the amino acid sequence, a putative transmembrane region is double underlined; proline and cysteine residues are inside squares and circles, respectively; the dileucine and YXX motifs are hatched, and potential phosphorylation sites are marked by asterisks (CKC2) or diamonds (PKC).
TNF-␣ expression during LPS stimulation. We wanted to see if there was any correlation of SIMPLE and TNF-␣ expression in our experimental condition. Human monocytes and murine RAW cells were stimulated by LPS and BCG, and at different time points RNA was prepared. The blots were first probed with species-specific SIMPLE cDNA and then stripped and reprobed with species-specific TNF-␣ cDNA (Fig. 6). In human monocytes BCG-CWS as well as LPS both enhanced the SIM-PLE expression, but the induction was faster in the case of LPS as evident from the peak expression levels, 2 versus 4 h by LPS and BCG-CWS, respectively (Fig. 6A). The induction of TNF-␣ expression was prior to the SIMPLE induction, suggesting TNF-␣ itself could be an inducer. This point was further verified by stimulating monocytes with recombinant TNF-␣ (Fig.  6B); however, the induction was not as robust as found in LPS treatment. In the case of a murine macrophage cell line, RAW, expression of SIMPLE gradually increased with a peak level at 24 h by BCG-CWS, whereas induction by LPS was observed at early time points (Fig. 6C). Despite the degradation of RNA in the control lane of LPS, it was not difficult to conclude that the expression of TNF-␣ was prior or concomitant to the SIMPLE induction as observed in human monocytes. Again, rapid induction by LPS and gradual induction by BCG-CWS for SIM-PLE message has been reflected in RAW cells, suggesting PAMPs derived from mycobacteria and Gram-negative bacteria differentially modulate the expression of SIMPLE. Interestingly, RAW cells showed a second transcript of ϳ1.5 kb and that was also altered due to the induction. We also tested THP-1 cells under similar conditions, but we could not detect significant differences in SIMPLE expression between stimulated and unstimulated cells. THP-1 cells itself had a good basal expression level that we found to be reduced upon phorbol 12-myristate 13-acetate treatment and cannot be elicited further by LPS/BCG-CWS treatment (data not shown).
Sequence Analysis of SIMPLE-Next we focused on structural analysis of the deduced amino acid sequence of SIMPLE to characterize its possible function. One easily noticeable feature of the deduced amino acid sequence of SIMPLE is its proline-rich N terminus and the cysteine-rich C terminus (Fig.  2). The total proline content is 15% which is about 3 times higher than the proline content typically found in eukaryotic proteins. The majority (87%) of the prolines are concentrated in the N-terminal half; the proline content in this region exceeds 22% and is completely devoid of any cysteine. Similarly the C-terminal half of 68 amino acids lacks proline abundance and possesses 11 cysteines. Unlike many proline-rich proteins, it does not have many glutamines; instead, it is often punctuated by serine-threonine and to some extent by tyrosine, and the majority of these residues are present in the N-terminal half. Proline-rich proteins with repetitive or non-repetitive motifs are found to be involved in varieties of functions (18,19). However, the overall pattern of proline in combination with serine, threonine, and especially with tyrosine, SIMPLE shows similarity with tyrosine-hydroxyproline-rich extensin family of plant cell wall proteins elicited during infection and wounding (20).
The most striking feature of SIMPLE is the C-terminal domain; motif search and profile scan indicate that the amino acid residues 96 -152 of SIMPLE are similar to C3H4 type zinc RING finger (21,22). As shown, cysteines (Fig. 9A, light green vertical bars) and possible loop1 and loop2 regions of SIMPLE can be aligned with a group of known RING finger proteins (Fig. 9, A and B). The alignment fits well with the consensus of RING and allied zinc finger motifs (23) such as FYVE and FYVE related fingers (24,25). The protein does not have any known nuclear localization signal sequence, but the k ϭ 9/23 in k-NN prediction (PSORT) suggests a 52% probability for nuclear localization. The presence of proline-cysteine-rich domains are the characteristics of several RING proteins including many transcription factors (26 -28), and they are found to be involved in diverse cellular functions through RNA or DNA binding, protein-protein interaction, or both (23, 29 -31).
Apparently SIMPLE has all the features to qualify as a RING protein and to be involved in transcriptional regulation. But the hydropathy profile (Fig. 8A) suggests SIMPLE has a potential TM domain, and the region lies within the predicted RING structure (Fig. 9A). The sequence has been analyzed through several transmembrane prediction programs available at the Expasy site and is detected as an integral membrane protein with the same TM domain. The SOSUI-predicted TMspanning region of 23 amino acid (Fig. 2) residues is long enough to conform stable integration in the membrane. Another intriguing feature of this C-terminal domain is the presence of a di-leucine motif and a YXX⌽ motif (Fig. 2), where X is any amino acid and ⌽ is a bulky hydrophobic amino acid. These motifs are known to interact with a family of adapter protein complexes during intracellular sorting of the transmembrane proteins and membrane receptors and also to function in lysosomal/endosomal and trans-Golgi network targeting (32, 33). The above analysis prompted us to examine whether SIMPLE is a member of the nuclear or cytoplasmic RING family proteins or a completely new type of membrane protein localized in the cell surface or intracellular vesicular compartment.
Subcellular Localization of SIMPLE-We first took the routine approach of constructing GFP fusion proteins and prepared N-terminally GFP-tagged versions of SIMPLE as it lacks the signal sequence. Expression in 3 different cell lines, COS-7, NIH3T3, and HeLa, produced comparable results. A representative example of expression in COS-7 cells is shown in Fig. 7A; N-terminally GFP-fused SIMPLE was concentrated in a paranuclear position with a number of vesicles surrounding the nucleus. Expression of the GFP-only vector showed diffuse cytoplasmic and nuclear fluorescence (data not shown). A paranuclear localization position is mainly suggestive of Golgi, however, and the trans-Golgi network, newly synthesized transport vesicles, and lysosomes are also present around this location, and overexpression makes this particularly difficult to ascertain. Therefore, a polyclonal antibody was generated against bacterially expressed His 6 -SIMPLE and used, after checking the specificity, for immunostaining in HeLa cells. Results of confocal analysis are shown in Fig. 7B; SIMPLE is distributed with LAMP-1 and Lysotraker-stained vesicular compartments, indicating SIMPLE is associated with perinuclear lysosomes and late endosomes. Cells with or without permeabilization were also analyzed by fluorescence-activated cell sorter; no definitive staining was observed in the plasma membrane or cells stained with preimmune serum, whereas permeabilized cells showed a strong fluorescence shift (data not shown).
In parallel, subcellular fractionation was carried out to check whether the immunostaining pattern corresponds. As evident from the Western blot of various subcellular fractions, the lysosome-enriched 8,000 ϫ g pellet (Fig. 7C, lane 2, top) contained SIMPLE; the same fraction showed the strongest signal for the presence of LAMP-1-positive vesicles (Fig. 7C, lane 2,  below). Absence of any cell surface expression further suggests that SIMPLE is a lysosomal/late endosomal residence protein rather than a recycling receptor. This work clearly demonstrated that SIMPLE is not a nuclear or cytosolic protein even though most of the RING proteins are found to be nuclear or multicomplex cytosolic proteins. SIMPLE Is an Integral Membrane Protein of the Lysosome-RING proteins like EEA1 are localized in early endosomes and utilize their RING domain for peripheral membrane anchoring (34). To see whether SIMPLE is an integral membrane protein of the lysosome, we have analyzed the partition of the protein during phase separation in a solution of Triton X-114 according to Bordier (35). In this method integral membrane proteins with an amphiphilic nature are recovered in the detergent phase, whereas peripheral and cytosolic proteins remain exclusively in aqueous phase. After two rounds of Triton X-114 extraction, the protein was completely recovered in the detergent phase (Fig. 8A), demonstrating that SIMPLE represented an integral membrane protein of the lysosome. Simultaneously, a duplicate blot also has been tested with a control 7-TM integral membrane protein to monitor the efficiency of extraction (data not shown).
Lysosomal membrane proteins are in general heavily Nglycosylated to be protected from unwanted degradation inside the lumen. SIMPLE does not have any N-glycosylation sites, although it has several O-glycosylation sites as predicted by the NetOglyc 2.0 program (Fig. 8B). In order to verify its glycosylation status, we used THP-1 cells as a natural source of the protein and SIMPLE cDNA-transfected rabbit RK13 cells for expressed protein. Expressed SIMPLE or THP-1-derived protein remained unaffected by O-glycanase ( Fig. 8B; THP-1 data not provided). Under similar conditions the deglycosylation status was observed for a known O-glycosylated protein (CD46), suggesting that SIMPLE is an unglycosylated protein, which is consistent with the fact that the O-glycosylation sites are poorly defined and not necessarily used (36). The protein is approximately 24 kDa (in both reduced and non-reduced condition) as detected in THP-1, human monocytes, HeLa, DLD-1, and in RK13-transfected cells. However, the unglycosylated molecular size (24 kDa) of SIMPLE is slightly higher than its unmodified calculated mass of 17 kDa; the slower migration could be attributed to the phosphorylation status or due to the proline richness of the protein as noted for other proteins such as Zyxin, Krupple, and TESK-1 (37,38).
Assignment of SIMPLE into a New Family-As mentioned above, the C-terminal domain of SIMPLE resembles the RING structure but interrupted by a TM region. That is an unusual structural feature unable to satisfy RING family characteristics, yet unique by itself. We wanted to know whether proteins containing this structural feature compose a new family, and we searched data bases to collect SIMPLE homologues and related proteins (see Fig. 9). The homology among human (SIMPLE), rat (EET-1; accession number U53184), and murine (TBX1; accession number AF171100) is high (90 -91%). A considerable degree of homology is present with Zebrafish (53%; EST AW184464) and chicken (72%; EST AI979890), and the most conserved region appeared to be the C-terminal domain 70 -75 residues long (N-terminal alignment for these proteins has not been shown). Based on this region of SIMPLE, we performed TBLASTN and BLASTP searches against several data bases and 2 rounds of PSI-BLAST iteration. Several hits appeared in these query modes, and simple visual inspection could identify that they have a pattern. An alignment of 18 sequences from all the species (2 human, 2 rodent, 1 fish, 1 avian, 5 insects, 8 nematode, and 1 from plant) is provided in Fig. 9C. Transmembrane prediction analysis was done for each protein, and strong TM regions were underlined whenever detected. It is now more convincing that this domain has a consensus (shown below the alignment profile) to be clearly distinguishable from the RING-like domain, yet borrowing the first and last pair of cysteines conserved among the zinc finger family. Between the cysteine dyad there is a long variable region that often harbors the membrane-spanning region. The TM region is preceded and followed by two unique consensus sequence signatures, CPXCX 5 T and #X 3 #X 2 HXCX 2 C, respectively (Fig. 9C). The majority of the proteins in this family seem to be small in size (around 160 aa); however, we can see the domain and the motifs in proteins of larger sizes such as C16orf5 (261 aa) and DmCG13515 (283 aa). The domain is not necessarily restricted to the C terminus of all proteins as in the case of C16orf5 (39) and CeT26805 (see TL/CL), suggestive of a module domain that could be utilized by a variety of proteins at different locations but serving a common function. In this connection Caenorhabditis elegans protein CeT26805 of 386 amino acids can be mentioned. The N-terminal region residues 31-98 of this protein show the SIMPLE-like domain signature, and the rest of the sequence (105-386 residues) is 70% similar to the WD repeat region of human ␤COP. CeT26805 is a hypothetical protein, yet it is a good example where the SIMPLE domain has been fused to the WD domain generating a new protein. Based on our analysis above we propose to designate this family, the domain, and the motif by the name of SIMPLE. A plant protein has been shown as an example, just below the consensus, to show that the hypothetical protein follows the consensus pattern and can be considered as a member of this new family (Fig. 9C). DISCUSSION The identification of SIMPLE, which is similar to LITAF or PIG7 transcripts but with a different coding potential, was an accidental finding. We confirmed in various ways the presence of SIMPLE as a single transcript and protein. Our main findings are as follows: 1) identity of SIMPLE with LITAF/PIG7 at the nucleotide level but not at the level of coded protein; 2) perfect agreement of all exon-intron junctions in the SIMPLE transcript but not in LITAF/PIG7; 3) lack of evidence for the presence of 5 Gs in the coding region sequences from monocytes, THP-1, and DLD-1 cells; and 4) an abrupt change (Fig.  9A) in the amino acid sequence in LITAF/PIG7, compared with SIMPLE, EET-1, and TBX1, supports a frameshift in the LITAF due to the misincorporation of an additional G residue. However, the region of dispute corresponds to a splice junction; aberrant splicing or allelic polymorphism of 4 Gs versus 5 Gs may still create a LITAF/PIG7-coded protein, and it could act as a dominant negative form against the natural version, SIMPLE.
Similarity with RING Family-The next question is whether SIMPLE can be considered as a variant of the zinc RING proteins because it has a similar sequence motif. We have not examined the zinc binding potential of this protein due to the difficulties of purifying its native form, and in addition, the predicted RING region has been found to be disrupted by a single potential TM domain. Our experimental evidence, including the phase separation, supports that the protein is tightly fastened in the intracellular membrane compartment, suggesting the predicted TM domain within the RING is the anchoring region. The TM domain signature also corresponds to the BLOCK pattern of the 5th TM domain of the Srg family 7-transmembrane receptor (40) of C. elegans, which further supports the integral nature of the domain. The presence of this TM domain dampens the possibility of considering this protein as a RING protein or even to be a divergent type. However, it is apparent from the alignment that the C-terminal domains of SIMPLE family genes are nicely bracketed by a pair  9. A, ClustalW alignment of SIMPLE with LITAF/PIG7, EET-1 (rat homologue), and TBX1 (murine homologue). Identical residues are in red with an asterisk; strongly similar residues are in green with a colon, and weakly similar residues are in blue with a single dot. From residue 127 of LITAF/PIG7 that corresponds to the region of 5 Gs in the coding sequence, there is an abrupt change (sequence in black italics) in amino acid sequence. B, cysteine residues in the C-terminal domain of SIMPLE, EET-1, and TBX1 correspond to the conserved positions (numbered in magenta text, 1-8) of cysteines and histidine in the RING domain. The aligned RING proteins were collected from Refs. 24 and 64. Below these sequences the consensus of RING family and related zinc finger domains (23,25) are also shown in magenta. The number of amino acid residues between the conserved cysteines and histidine is omitted for the alignment purpose and is shown as dashed lines. C, alignment of the C-terminal of CXXC motifs (Fig. 9C, green bars), resembling the first and last pair of cysteines in zinc finger proteins. This feature, together with the fairly well spaced additional cysteines within the bracket, can easily be mistaken as a RING-like contour. We conclude from the alignment profile that the proteins under this family share a domain that has similarity in organization with the RING domain; however, the presence of the TM domain limits further comparison.
Comparison with Major Lysosomal Membrane Proteins-SIMPLE is a new lysosomal membrane protein, a motif search showed that the N-terminal 14 amino acids (residues 10 -23) of SIMPLE have homology with the LAMP block. We compared SIMPLE with major integral membrane proteins of lysosomes (41), LAMPs, LIMPs, and also with Endolyn (42,43). ClustalW alignment showed patches of sequence similarity with those groups mainly due to the proline, serine, and threonine (Pro Ͼ SerϭThr) richness of the N-terminal domain of SIMPLE. The partially matched regions correspond to the mucin-like domains (Ser-Thr-rich) of Endolyn, CD168, and DC-LAMP and the hinge regions (Pro-Ser/Ser-Thr-rich) of LAMPs. There is little architectural similarity between SIMPLE and classic bipartite or semi-bipartite patterns of the extracellular domain of LAMP or the Endolyn family (41,43). Mucin-like domains and hinge regions of the above families of proteins are heavily Nand O-glycosylated (41,44), whereas SIMPLE is completely devoid of glycosylation. The C-terminal domain also does not show any similarity except for the presence of dileucine (LL) and YXX⌽ motifs, one of which is invariably present in the cytoplasmic domain of the above-mentioned families. In the case of the LAMP family, YXX⌽ is preceded by a G and is known to be critical for direct delivery from trans-Golgi network to lysosome (45). SIMPLE, EET-1, and TBX1 all possess the YXX⌽ motif with a GT prefix, although the most conflicting and contradictory aspect is the predicted type II orientation of SIMPLE. SIMPLE lacks the typical signal sequence and contains only one potential hydrophobic stretch that presumably works as a stop transfer signal. According to the charge difference rule (46) of TM topology the ⌬(C-N) value is Ϫ2.5, indicating that the dileucine and YXX⌽ motifs will be in the luminal side. The charge difference is not the sole factor involved in membrane orientation; type II proteins may have a type III configuration (47,48), and in that case LL/YXX⌽ will be retained in the cytoplasmic tail, which remains to be experimentally verified for SIMPLE.
Induction of SIMPLE by Microbial Components-BCG cell wall components and LPS are potent effectors exerting maturation and survival signals for monocytes and dendritic cells, and these pathways are intimately linked to TLR2-and TLR4mediated signaling (49). TLR2 has been characterized recently as a novel death receptor (50) that implies an analogous mechanism to the TNF receptor family; the signaling events of these receptors are bifurcated downstream, and they can modulate life and death upon ligand activation during infection. The genes regulated through these pathways leading to the final cellular response are mainly unknown. The putative promoter region of SIMPLE contains AP1-and p53-like binding motifs; if those are functional this could explain how SIMPLE could be induced through TLR/TNF receptor and p53-mediated pathways.
Relevance of SIMPLE Induction by M. bovis-During microbial infection one of the most predominant innate responses exerted by an immunocompetent host is the induction of apoptosis of the infected cells thereby minimizing the spread and restricting the infection. The similarity of SIMPLE with PIG7, which is more than 10-fold increased in a p53-mediated apoptotic environment, and its localization in lysosomes, which play role in the process of cell self destruction, suggest SIMPLE could be involved in host cell apoptosis. In vitro studies have shown that apoptosis is responsible for intracellular killing of mycobacteria (51), and down-regulation of anti-apoptotic genes has been observed due to BCG or heat-killed Mycobacterium tuberculosis (52). Since promoting apoptosis is not beneficial for the growth of the bacilli, the components of programmed cell death are also impaired. It is evident from recent work that both live and heat-killed M. bovis BCG were capable of increasing the viability of monocytes through up-regulation of an anti-apoptotic gene A1 (53). Hence, it is not unlikely that many genes in apoptotic and anti-apoptotic pathways could be altered during mycobacterial infection. Several lines of evidence suggest that avirulent strains of Mycobacterium are most active in eliciting the apoptotic response, whereas the virulent strain bypasses this, and cells remain less apoptotic (54,55). If the expression of SIMPLE is potentially connected to elicit the host cell apoptosis, the expression could be affected differentially by virulent and avirulent strains.
At this point, there is no direct evidence that PIG7 or SIM-PLE is involved in apoptosis because their mechanism of action is unknown. However, SIMPLE as a lysosomal membrane protein and being proline-rich may raise an interesting possibility in the view of lysosomal and ubiquitin-mediated intracellular protein degradation pathways (56 -58). Lysosomal membrane protein LAPTM5 (59) that is specifically expressed in hematopoietic cells possesses a proline-rich carboxyl-terminal domain, and the domain has been found to interact with precursors of ubiquitin, leading to the concept that LAPTM5 mediates degradation of ubiquitinated protein in the lysosome. Another lysosomal membrane protein LAMP2/LGP96 (60) has been found to be a receptor for selective uptake of proteins into the lysosome and subsequent degradation. Recently, a unique sequence motif has been detected in the cytosolic tail of the LAMP2a isoform, which is required for the binding of substrate protein and is proposed to be important for chaperone-mediated autophagy by the receptor (61). Structural analysis of the proline-rich domain and the identification of interacting proteins for SIMPLE may provide important clues about the function of this gene. It is also intriguing that the rat homologue, EET-1, was rapidly induced by estrogen treatment in the rat uterus. Programmed cell death is an essential feature of normal ovarian and uterine cycles (62,63), and increased lysosomal activity is known during endometrial/luteal degeneration. Studying rodent reproductive tissue may reveal the functional aspect of SIMPLE as it lacks estrogen-responsive elements in the 3Ј-untranslated region, and its regulation by estrogen in human has yet to be defined. domain of SIMPLE with sequences from other species. Sequences were mainly obtained by searching protein data bases through BLASTP and PSI-BLAST programs available at NCBI home page. Some sequences are also obtained by running the TBLASN program against EST data bases using the C-terminal domain of SIMPLE (residues 91-161) as a query. Multiple alignment of protein sequences has been performed using MultAlin sequence alignment program and by manual adjustment. The consensus shown just below the alignment includes residues conserved in the majority (Ͼ90%, uppercase in red; Ͼ50%, lowercase in blue) of the aligned sequences. The 1st right-hand column indicates the total length (TL) known for each protein versus the sequence length selected for the comparison (CL), and the last right-hand column shows the % identity (I)/% positive (P) values obtained by BLOSUM62 matrix setting for each sequence with respect to the C-terminal domain of SIMPLE. Accession numbers for SIMPLE (Hum), TBX1 (Mus; Mus musculus) and EET-1 (Rat), and for chicken and Zebrafish (Zeb) were cited in the text. Gene identification or accession number has been included for Drosophila melanogaster (Dm), C. elegans (Ce), and Arabidopsis thaliana (At), and the EST for Bombyx mori (Bombyx) is AU004815.
In summary, identification of SIMPLE revealed that LITAF/ PIG7 could encode the same protein as EET-1 provided a G residue was absent from a specific region of the coding sequence. We confirmed the absence of the G residue in an identical transcript, SIMPLE. SIMPLE belongs to a new family of proteins having a unique domain with two conserved sequence motifs. The gene was induced in antigen-presenting cells upon activation by microbial components, and a dramatic induction occurred during p53-mediated apoptosis. There is little information regarding the role of lysosomal membrane proteins in apoptosis and their alteration during infection. Characterization of SIMPLE as a novel member of lysosomal membrane proteins certainly puts forward this gene as a promising candidate defining its novel role in programmed cell death and in host-microbial interaction.