Characterization of a Novel, Stage-specific, Invariant Surface Protein in Trypanosoma brucei Containing an Internal, Serine-rich, Repetitive Motif*

A new surface membrane protein, invariant surface glycoprotein termed ISG100, was identified inTrypanosoma brucei, using catalyzed surface, radioiodination of intact cells. This integral membrane glycoprotein was purified by a combination of detergent extraction, lectin-affinity, and ion-exchange chromatography followed by preparative SDS-polyacrylamide gel electrophoresis. The protein was expressed only in bloodstream forms of the parasite, was heavilyN-glycosylated, and was present in different clonal variants of the same serodeme as well as in different serodemes. The gene for this protein was isolated by screening a cDNA expression library with antibodies against the purified protein followed by screening of a genomic library. The nucleotide sequence of the gene (4050 base pairs) predicted a highly reiterative polypeptide containing three distinct domains, a unique N-terminal domain of about 10 kDa containing three potential N-glycosylation sites, which was followed by a large internal domain consisting entirely of 72 consecutive copies of a serine-rich, 17-amino acid motif (∼113 kDa) and terminated with an apparent transmembrane spanning region of about 3.3 kDa. The internal repeat region of this gene (3672 base pairs) represents the largest reiterative coding sequence to be fully characterized in any species of trypanosome. There was no significant homology with other known proteins, and overall the predicted protein was extremely hydrophobic. Unlike the genes for other surface proteins, the gene encoding ISG100 was present as a single copy. Although present in the flagellar pocket, ISG100 was predominantly associated with components of the pathways for endo/exocytosis, such as intracellular vesicles located in the proximity of the pocket as well a large, electron-lucent perinuclear digestive vacuole.

A new surface membrane protein, invariant surface glycoprotein termed ISG 100 , was identified in Trypanosoma brucei, using catalyzed surface, radioiodination of intact cells. This integral membrane glycoprotein was purified by a combination of detergent extraction, lectin-affinity, and ion-exchange chromatography followed by preparative SDS-polyacrylamide gel electrophoresis. The protein was expressed only in bloodstream forms of the parasite, was heavily N-glycosylated, and was present in different clonal variants of the same serodeme as well as in different serodemes. The gene for this protein was isolated by screening a cDNA expression library with antibodies against the purified protein followed by screening of a genomic library. The nucleotide sequence of the gene (4050 base pairs) predicted a highly reiterative polypeptide containing three distinct domains, a unique N-terminal domain of about 10 kDa containing three potential N-glycosylation sites, which was followed by a large internal domain consisting entirely of 72 consecutive copies of a serine-rich, 17-amino acid motif (ϳ113 kDa) and terminated with an apparent transmembrane spanning region of about 3.3 kDa. The internal repeat region of this gene (3672 base pairs) represents the largest reiterative coding sequence to be fully characterized in any species of trypanosome. There was no significant homology with other known proteins, and overall the predicted protein was extremely hydrophobic. Unlike the genes for other surface proteins, the gene encoding ISG 100 was present as a single copy. Although present in the flagellar pocket, ISG 100 was predominantly associated with components of the pathways for endo/exocytosis, such as intracellular vesicles located in the proximity of the pocket as well a large, electron-lucent perinuclear digestive vacuole.
The African trypanosomes are a group of unicellular eukaryotes responsible for sleeping sickness in man and related diseases in animals. The life cycle of these parasites, which alternates between the mammalian host and the tsetse fly vector, is characterized by the stage-specific expression of two major surface glycoproteins, namely the variant surface glycoprotein (VSG) 1 and procyclin during the bloodstream and insect mid-gut stages, respectively. These two proteins cover the entire surface of the trypanosome and are by far the most abundant of the surface proteins during these stages of the life cycle. Although there is a considerable body of data in the literature concerning the structure, function, and differential expression of VSG and procyclin (1)(2)(3), only relatively recently (for review, see Overath et al. (4) has attention focused on other surface proteins located either beneath the surface coats of VSG/procyclin or within the the flagellar pocket, a specialized invagination of the plasma membrane, which is involved in the pathways for endo-and exocytosis in these cells. There are several reasons why interest in these proteins is increasing. First, some of these proteins must be involved in a range of cellular housekeeping processes, such as the uptake of high molecular weight ligands and the transport of solutes and ions. Second, given their functional role, such proteins are unlikely to undergo antigenic variation in the same manner and extent as the VSG and, consequently, may represent suitable targets for the development of vaccines or chemotherapy. Third, even if the development of effective vaccines against these invariant surface antigens proves to be either impossible or impractical, the fact remains that the plasma membrane of these cells represents an important biological interface between host and parasite and as such is likely to occupy a pivotal position in multiple signaling pathways, such as those involved in interactions between the mammalian host and arthropod vector, switching between different life cycle stages and probably intercellular communication with other trypanosomes. Arguably, these topics are among the most exigent areas of research in molecular parasitology at present, and an important step toward improving our understanding of these processes must be the identification and characterization of the plasma membrane proteins that mediate them.
In this respect it is significant that, to date apart from the VSG and procyclin, only three other surface proteins have been fully characterized at a molecular and functional level in Tryp-anosoma brucei: the heterodimeric transferrin receptor (5-7), a Ca 2ϩ -regulated adenylate cyclase (8,9), and the glucose transporter (10). Although functional assays have identified receptors for lipoproteins and lytic factors (11)(12)(13) as well as for several plasma membrane enzymes (14,15), there is no data available concerning the molecular nature of the proteins responsible. Conversely, the genes for several putative plasma membrane proteins, termed ESAGs for expression site-associated genes, have been identified in T. brucei, which appear to code for amphiphatic polypeptides that contain typical membrane signal sequences and consensus sites for N-glycosylation, but the function and cellular location of the proteins encoded by these genes has yet to be established (16). Recently, more direct approaches involving surface labeling of whole cells using biotinylation or iodination have identified a number of bloodstream stage-specific, invariant surface glycoproteins or ISGs that were uniformly distributed over the entire surface of the trypanosome (17)(18)(19). The genes for two of these ISGs have been cloned and code for proteins with N-terminal leader sequences, hydrophilic extracellular domains, single transmembrane spanning regions as well as sites for N-glycosylation, and were present as multiple copies (20,21). However, no homology with other proteins was observed, and the function of these ISGs also remains unknown.
Here we describe the isolation, purification, and molecular characterization of a novel surface protein in T. brucei. Uniquely, this bloodstream, stage-specific protein contains a large internal domain consisting entirely of 72 copies of a previously undescribed amino acid motif. Unlike the genes for other plasma membrane proteins described so far, the gene for this protein is present as a single copy.

EXPERIMENTAL PROCEDURES
Materials-All radiochemicals and nitrocellulose filters were obtained from Amersham International plc, Amersham Buckshire, United Kingdom. Protein A-Sepharose CL 4B and concanavalin A-Sepharose 4B were purchased from Sigma. N-Glycopepidase F was obtained from Boehringer Mannheim. Alkaline phosphatase-conjugated and fluorescein isothiocyanate-labeled anti-rabbit IgG were obtained from Promega and Amersham, respectively. Other reagents were of the highest purity available.
Source and Purification of Trypanosomes-Cloned variants of T. brucei (MITats 1.1, 1.4, 1.5, ILTat 1.21, and AnTat 1.1) were grown in laboratory rats and purified as described previously (14). Procyclic forms of T. brucei were produced by transformation of MITat 1.1 in SDM-79 and maintained in the same medium as described in a previous report (22).
Surface Labeling of Trypanosomes-All surface labeling experiments of whole cells were performed using lactoperoxidase/glucose oxidase or IODO-GEN and carrier-free Na 125 I (500 Ci/10 8 cells) as described previously (19).
Purification of ISG 100 -All steps were performed at 0 -5°C unless otherwise stated. A sample of surface radioiodinated cells (5 ϫ 10 8 cells) was mixed with a suspension of freshly harvested trypanosomes (ϳ5 ϫ 10 10 cells) to provide a marker for ISG purification. The mixed suspension was washed twice in Tes buffer (50 mM, pH 7.5), NaCl (150 mM), and the pellet was resuspended in the same buffer (10 ml) containing leupeptin (30 g/ml), PMSF (0.2 mM), E-64 (20 M), TLCK (50 M), and EDTA (1 mM). The cells were lysed by the addition of 100 ml of distilled deionized water (containing protease inhibitors) with gentle mixing. After an incubation of 10 min at 37°C, 5 ml of NaCl (3 M) was added, and the suspension was centrifuged at 10,000 ϫ g for 5 min. The pellet was washed twice in Tes buffer containing protease inhibitors and was resuspended in the same buffer containing MgCl 2 (5 mM) and DNase I (100 units/1 ϫ 10 10 cells) and incubated at room temperature for 5 min. After DNase treatment this total membrane fraction was centrifuged (10,000 ϫ g, 5 min) and the pellet was washed twice and resuspended in 100 ml of Tris buffer (50 mM, pH 7.5) containing NaCl (150 mM), leupeptin (30 g/ml), PMSF (0.2 mM), E-64 (20 M), TLCK (50 M), and EDTA (1 mM). Detergent extraction of the membranes was performed by the addition of an equal volume of the same buffer containing Triton X-100 (2%, w/v) followed by an incubation on ice for 1 h prior to centrifugation (50,000 ϫ g, 2 h). The supernatant was applied (0.5 ml/min) to a column (2-ml bed volume) of concanavalin A-Sepharose previously equilibrated with Tris/NaCl buffer containing Triton X-100 (1%, w/v). The column was washed with Tris buffer (10 mM, pH 6.5) containing Triton X-100 (0.1%, w/v), and bound glycoproteins were eluted with 0.5 M ␣-methylmannoside in the same buffer. This eluate was applied (0.2 ml/min) to a DEAE-52 column (5-ml bed volume) equilibrated with Tris buffer (10 mM, pH 6.5) containing Triton X-100 (0.1%, w/v). The column was washed with the equilibration buffer (six column volumes), and a linear gradient of KCl (0 -0.5 M), in the same buffer, was applied to the column (10 column volumes). Fractions (1 ml) were collected throughout, and their content of radioactivity was measured. Those fractions corresponding to the peaks of radioactivity were analyzed by SDS-PAGE and autoradiography as described previously (19). The bands corresponding to ISG 100 , which were eluted from the column by the salt gradient, were excised, reequilibrated with SDS-PAGE sample buffer, and subjected to preparative SDS-PAGE. The resulting ISG 100 bands, which were detected by autoradiography, were excised, rehydrated with water, and subsequently used to immunize rabbits.
Raising Antisera-Polyclonal anti-ISG 100 antibodies were raised in rabbits and the IgG fraction was prepared exactly as described previously (19).
Gels, Immunoblotting, and Immunoprecipitates-SDS-PAGE, autoradiography, and Western blots were performed as described previously (19). Detergent lysates were prepared by the method of Anderson and Blobel (23). Immunoprecipitations were performed by mixing samples (100 l) of detergent lysates, corresponding to 10 7 cells, with the antisera (determined by titration but routinely 20 l) followed by an incubation of 30 min at 4°C. The antibody-treated lysates were then mixed with a slurry (1:1, v/v) of protein A-Sepharose CL 4B in phosphate buffer in a ratio of five volumes of slurry for each volume of antibody added. The mixture was agitated at 4°C for at least 2 h before centrifuging, and washing the resin and elution of the immune complexes was as described elsewhere (19).
Immunofluoresence Microscopy-Trypanosomes (4 ϫ 10 7 cells/ml) were fixed in paraformaldehyde (2%, w/v) in phosphate buffered saline (pH 7.5) at 4°C for 20 min. The fixed cells were washed in PBS containing BSA (5%, w/v) and glycine (1 mg/ml). The cells were resuspended in PBS/BSA (2 ϫ 10 7 cells/ml), and 20 l were spread gently and uniformly onto poly-L-lysine-coated glass slides and air dried. After blocking nonspecific protein binding sites with PBS/BSA the cells were treated overnight with the antibodies diluted 1/150 in PBS/BSA containing Tween 20 (0.1%, w/v). The slides were washed with PBS/BSA prior to treatment with fluorescein isothiocyanate-labeled anti-rabbit globulin (1/500 in PBS/BSA) for 1 h at room temperature and washed again before mounting in 50% glycerol (w/v) in PBS containing 1,4diaza-bicyclo(2,2,2)octane (25 mg/ml). The cells were viewed and photographed using a Zeiss Axioplan immunofluoresence microscope with an oil immersion ϫ100 Plan-Neofluar objective. High resolution immunofluorescence experiments were performed using a Zeiss Axioskop microscope equipped with a digital in situ imaging system.
Immunogold Electron Microscopy-Pleomorphic bloodstream forms of T. brucei were fixed for 15 min at room temperature in paraformaldehyde (4%, w/v) and glutaraldehyde (0.1%, w/v) diluted in cacodylate buffer (0.1 M, pH 7.2). After dehydration in methanol with progressive lowering of the temperature, they were embedded in Lowicryl, which was polymerized under UV light at Ϫ20°C. Ultrathin sections recovered on carbon-formvar nickel grids were floated on PBS containing BSA (1%) and goat serum (10%) for 1 h and then overnight, at 4°C, on PBS/BSA containing anti-ISG 100 antisera (1/100). After three washes in PBS/BSA containing Tween 20 (0.05%, w/v), the grids were floated for 1 h on PBS/BSA containing EM goat F(ab)2 anti-rabbit IgG (Biocell) or protein A (Sigma), both conjugated to 10-nm gold particles (both diluted 1/100). The grids were washed three times in PBS/BSA containing Tween 20, rinsed in distilled water, and stained with uranyl acetate and lead citrate before being viewed in an AEI 6B electron microscope at 60 kV.
Cloning and Sequencing the Entire ISG 100 Gene-A cDNA library was constructed in gt11 using poly(A) mRNA isolated from bloodstream forms of T. brucei (AnTat 1.1) as described by Gubler and Hoffman (24) using the Amersham cDNA synthesis and cloning kits. A SalI T. brucei (AnTat 1.3) genomic library was constructed in EMBL 12 according to standard methods (25). The gt11 library was screened according to the supplier's instructions with a few modifications. Nonfat milk powder (5%, w/v) in Tris-buffered saline, containing Nonidet P-40 (0.05%, w/v) and a bacterial lysate, was used to block nonspecific binding to the nitrocellulose filters and to dilute the antibodies (1/1000 of the IgG fraction). The bacterial lysate was prepared by diluting 1 ml of an overnight culture of the host strain (Y1090) in 50 ml of LB and incubating at 37°C with shaking for 2-3 h. The bacteria were collected by centrifugation and resuspended in distilled deionized water (3 ml), lysed by sonication, and denatured by incubating the lysate at 100°C for 4 min. The lysate was diluted 1/100 prior to use. Alkaline phosphatase-conjugated goat anti-rabbit IgG was used, at a dilution of 1/7500, in accordance with the supplier's instructions (Promega). Positive plaques were purified by repeated screening, and the insert cDNAs were subcloned into pUC vectors. Inserts of positive clones were sequenced on both strands by the dideoxy chain termination method (25) using the U. S. Biochemical Corp. Sequenase version 2.0 kit. Sequence analysis was performed using the GCG/Wisconsin software (26).
A partial cDNA obtained from the gt11 library was used to screen an EMBL 12 genomic library, using 32 P-labeled probes according to standard methods (27) and resulted in the isolation of a SalI genomic fragment (10 kb) containing the ISG 100 gene. Subsequently, an NlaIII fragment (5.2 kb) that contained the complete coding region of the ISG 100 gene was isolated from this SalI genomic fragment and subcloned into the SphI site of pUC18. Sequencing of the entire NlaIII fragment on both strands represented a major effort and was accomplished by a combination of primer walking where possible and the generation of a series of nested exonuclease III deletions using the Erase-a-Base® system obtained from Promega. This process involved the isolation of clones containing the NlaIII fragment in both orientations in the SphI site of the pUC18 vector. Plasmid DNA prepared from these clones was digested with SacI and XbaI to generate Exo IIIresistant 3Ј and -susceptible 5Ј overhangs, respectively. A nested set of deletions in the target DNA was then prepared by digestion for various times with Exo III according to the supplier's instructions. Since the reiterative region of the gene (ϳ3.7 kb) lacked unique restriction sites, several hundred clones were screened, and in each case the plasmid DNA was linearized by appropriate restriction enzyme digestion and subjected to careful size fractionation on agarose gels to obtain a set of deletion clones that differed by about 200 bp and spanned the entire repeat region. All of these clones were sequenced by the dideoxy chain termination method using the 3Ј-flanking vector sequence as a primer (reverse primer). It should be stressed that, throughout, great care and attention was paid to the sequencing and alignment of these clones. The latter process was aided to some extent by small variations in sequence of the 51-bp repeat.
DNA and RNA Analysis-The procedures employed for the isolation of DNA and RNA as well as Southern and Northern blot hybridizations are described elsewhere (28).
ELISA of the Repeat Motif-A synthetic peptide corresponding to the consensus sequence of the amino acids predicted from the nucleotide sequence of the 51-bp repeat region of the gene for ISG 100 was synthesized. The amino acid sequence was N-to C-terminal, GIAASSLLSS-FASSSAV. A stock solution of peptide (5 mg/ml) was prepared in trifluoroacetic acid and immediately diluted in coating buffer (NaHCO 3 , 35 mM; Na 2 CO 3 , 15 mM; pH 9.6) to give a range of peptide concentrations (0.05 to 20 g/ml). Samples (100 l) of these peptide solutions were used to coat the wells of a microtiter plate (Nunc) that had been previously precoated with poly-L-lysine to enhance adsorption of the peptides as described previously (29). The wells were washed three times with PBS and incubated with blocking buffer, Tris-buffered saline containing BSA 2% (TBS/BSA), for 30 min at 37°C. The wells were washed with with TBS, and 100 l of anti-ISG 100 polyclonal antibodies, appropriately diluted in TBS/BSA, were added to each well and incubated at room temperature for 2 h. After washing with TBS/BSA, alkaline phosphatase conjugated goat anti rabbit IgG (100 l, diluted 1/7500 in TBS/BSA) was added to each well, incubated for 1 h at room temperature, and then each well wash washed several times with TBS. Finally, 100 l of alkaline phosphatase assay buffer containing 4nitophenylphosphate (2 mg/ml) were added to each well. The plates were incubated at room temperature for 30 min, and the absorption at 405 nm of each well was measured using an automatic plate reader.
Fractionation and Purification of the Flagellar Pocket Membrane-All steps were performed at 4°C. Isolated trypanosomes (5 ϫ 10 10 cells) were washed twice in isotonic sucrose (sucrose, 250 mM; EDTA, 1.5 mM; KCl, 1 mM; Hepes, 5 mM, pH 7.5). The paste of cells was transferred to a chilled mortar, and the cells were disrupted by grinding with glass Ballotini beads (Sigma acid washed, 75-200 m). Grinding was continued until about 75% disruption was achieved as judged by phase contrast microscopy. The homogenate was resuspended (ϳ10 ml) in isotonic sucrose containing leupeptin (40 g/ml), PMSF (0.2 mM), E-64 (10 M), TLCK (50 M). The glass beads were sedimented by centrifugation at 800 rpm (77 ϫ g) for 1 min and washed twice with the same buffer. The combined supernatants were centrifuged at 2000 ϫ g for 4 min, the supernatant was removed, and the loose pellet was washed once. The combined supernatants were centrifuged at 65,000 ϫ g for 60 min. The high speed pellet was resuspended in no more than 3 ml of isotonic sucrose buffer and carefully layered onto the top of a discontinuous sucrose gradient, which consisted of various sucrose solutions, prepared in Hepes (5 mM, pH 7.5), (KCl, 1 mM), EDTA (1.5 mM), calculated to give the following final densities: 1.094 g/ml (8 ml), 1.113 g/ml (10 ml), 1.187 g/ml (8 ml), and 1.252 g/ml (6 ml). These densities were based on the median equilibrium densities of the flagellar pocket membrane (1.11 g/ml), microsomal/endoplasmic reticulum membrane (1.175 g/ml), and the plasma membrane (1.22 g/ml) (31). The gradient was centrifuged at 58,000 ϫ g for 90 min using an AH629 rotor. After centrifugation, the material banding at each of the density interfaces, which corresponded to the flagellar pocket membrane, microsomal/endoplasmic reticulum membrane, and pellicular plasma membrane fractions, respectively, as the interface density increased, was removed, diluted to 35 ml with isotonic sucrose buffer, and centrifuged at 70,000 ϫ g for 60 min. The pellets were resuspended in 0.5-1.0 ml of isotonic sucrose buffer. The material corresponding to the flagellar pocket fraction, i.e. the material which banded at the 1.094 -1.113 g/ml density interface, was layered onto a continuous sucrose gradient (20 to 45%, 30 ml) over a 54% sucrose cushion (3 ml), both prepared in the Hepes buffer. This gradient was centrifuged at 58,000 ϫ g for 3 h using the AH629 rotor. After centrifugation the gradient was fractionated and screened for the presence of the heterodimeric transferrin receptor, a marker for the flagellar pocket (5), by Western blots and ELISAs using anti-ESAG7 antibodies. In general the flagellar pocket membranes were visible as a distinct white band about 60% down the gradient. These membranes were washed with isotonic sucrose buffer, resuspended in the same buffer (ϳ1 mg/ml), and stored at Ϫ80°C. The flagellar pocket, micro- somal/endoplasmic reticulum, and pellicular plasma membrane fractions prepared using this protocol, respectively, represented 0.58 Ϯ 0.25%, 0.48 Ϯ 0.22%, and 4.1 Ϯ 0.9% of the total cellular protein present in the initial homogenate after removal of the glass beads (mean Ϯ S.E. of four preparations).

ISG 100 Is a Bloodstream
Stage-specific Glycoprotein-In a previous report we identified two invariant surface proteins in bloodstream forms of T. brucei using surface radioiodination, but noted that other less abundant surface proteins were also labeled (19). The final steps in the purification of one of these proteins termed ISG 100 , for invariant surface glycoprotein with M r ϳ 100,000, are presented in Fig. 1. 125 I-Labeled surface glycoproteins that were specifically eluted from a concanavalin A-Sepharose column were fractionated by ion-exchange chromatography on DEAE-52. Approximately 60% of the applied radiolabeled proteins did not adhere to the resin, while the remainder of the labeled proteins were eluted by the application of a linear salt gradient (Fig. 1A). The majority of retained proteins eluted at relatively low concentrations of salt (Ͻ150 mM), and SDS-PAGE analysis of these fractions demonstrated that the 125 I label was primarily associated with three distinct bands (Fig. 1B). Two of these proteins migrated with an apparent M r comparable to that reported previously for ISG 70 and ISG 64 (19) and were well resolved from the slower migrating ISG 100 . Significantly, only a single band of radioactivity was obtained when the material corresponding to ISG 100 was excised from these gels and reelectrophoresed on a preparative SDS-PAGE gel. In a typical preparation 6.7 Ϯ 0.3 g of purified ISG 100 was obtained from 36 Ϯ 2.5 mg of the total membrane fraction obtained after osmotic rupture of cells.
Antibodies against purified ISG 100 allowed the specific immunoprecipitation of the protein from detergent extracts of cells surface-labeled with 125 I or subjected to metabolic labeling with [ 35 S]methionine ( Fig. 2A). Significantly, all of the ISG 100 was recovered exclusively in the membrane fraction following osmotic rupture of cells ( Fig. 2A). Repeated washing of this fraction with NaCl (0.5 M) failed to release detectable amounts of ISG 100 (data not shown), whereas the protein was readily solublized by detergents as observed during the purification procedure. These data indicated that ISG 100 was an integral rather than peripheral membrane protein or attached to the membrane by a glycosylphosphoinositol anchor in the same manner as the VSG.
An unusual feature of ISG 100 was its relative resistance to proteolytic cleavage in whole cells. All of the labeled ISG 100 remained associated with the cellular pellet, and there was no evidence of proteolytic degradation of the protein, following mild trypsin treatment of intact radioiodinated cells ( Fig. 2A). Similar results were obtained with low concentrations of papain, pepsin, and chymotrypsin (data not shown), whereas other ISGs and the VSG were readily cleaved from the cell surface under these conditions (19). However, purified ISG 100 was sensitive to these proteases (data not shown). Immunoprecipitation of ISG 100 metabolically labeled with [ 35 S]methionine ( Fig. 2A) demonstrated that the protein was stage-specific and was expressed only in bloodstream forms of T. brucei. These results also eliminated the possibility that the ISG 100 was a host protein adsorbed to the surface coat of the cell.
The binding and specific elution of ISG 100 from concanavalin A-Sepharose indicated the presence of covalently bound carbohydrate. This view was supported by the effect of treatment of the immunoprecipitated protein with endoglycopeptidase F, which resulted in the complete loss of the 100-kDa band and the concomitant appearance of a band with apparent M r ϳ 55,000 (Fig. 2B, lane 2). An intermediate band, M r ϳ 70,000, was also detected when shorter incubations were employed, suggesting the presence of two distinct N-linked glycosyl chains (data not shown). The large difference in the electro- phoretic mobility of the protein after deglycosylation clearly indicated the presence of extensive N-linked carbohydrate chains. The antibodies also efficiently recognized the deglycosylated form of the protein (data not shown). Finally, antibodies against ISG 100 purified from the MITat 1.1 clone also allowed the specific immunoprecipitation of the same labeled protein from other clonal variants (Fig. 2B).
Isolation of the Gene Coding for ISG 100 -A T. brucei bloodstream form expression library was screened with polyclonal antibodies against purified ISG 100 . Approximately 2 ϫ 10 5 clones were screened and 17 positive clones were obtained, which were purified by repeated screening. Restriction digest and Southern blot analysis of the DNA prepared from these clones indicated that the cDNA inserts were all related, and the largest of these inserts (ϳ2.1 kb) was subcloned into pUC18. Sequence analysis of this insert demonstrated that it was composed almost entirely of a 51-bp repeat element with the exception of a unique sequence of about 100 bp located at one end (Fig. 3). The repeat region lacked sites for most restriction enzymes, including frequent cutters, such as RsaI, but digestion with PvuII released a unique 51-bp repeat fragment (data not shown). Although this cDNA was incomplete and lacked both a poly(A) tail and the 5Ј mini exon common to all trypanosomal mRNAs, the nucleotide sequence suggested that this region encoded a polypeptide containing a repetitive motif.
To clone the entire ISG 100 gene, a SalI genomic library was screened (1.5 ϫ 10 5 clones) with the partial cDNA and resulted in the isolation of five positive clones, all of which contained the same 10-kb SalI fragment. Digestion of this genomic fragment with Nla III yielded a unique fragment that strongly hybridized with the ISG 100 cDNA probes (Figs. 3 and 6). This fragment was subcloned into the SphI site of pUC18, and the entire region was carefully sequenced on both strands (see "Experimental Procedures" for details).
Predicted Amino Acid Sequence-Analysis of the sequence of this NlaIII genomic fragment (5.2 kb) predicted the presence of a single, large open reading frame of 4050 bp (Fig. 4A), which was in good agreement with the size of the single transcript detected on Northern blots ϳ4.4 kb (see Fig. 6B). This open reading frame started and finished with initiation and termination codons located at positions 760 and 4810 downstream from the Nla III site and consisted of a unique 282-bp 5Јsequence followed by a long internal sequence (3672 bp) consisting entirely of 72 copies of a 51-bp repeat element and ended with a short nonreiterative 3Ј-sequence of 96 bp (Fig.  4A). Thus, the predicted polypeptide (ϳ127 kDa) consisted of three distinct domains, an N-terminal domain of about 10 kDa that contained three potential N-glycosylation sites, which was followed by a large internal domain consisting of about 72 consecutive copies of a serine-rich, 17-amino acid motif (ϳ113 kDa) and terminated with a short C-terminal region of about 3.7 kDa (Fig. 4B).
To confirm the amino acid sequence, a synthetic peptide was prepared corresponding to the amino acid repeat motif predicted from the nucleotide sequence to be in frame with the initiation codon. This serine-rich peptide was readily recognized by polyclonal antibodies raised against purified ISG 100 in ELISAs (Fig. 5A). Moreover, the predicted polypeptide contained only a single tyrosine residue located three residues from the C-terminal histidine and 16 amino acids downstream from the single tryptophan present in the entire protein. This sequence was also confirmed by peptide mapping experiments on the purified ISG 100 with N-chlorosuccinimide (100 mM), under conditions specific for cleavage at tryptophanyl peptide bonds (19), which resulted in the complete loss of the 125 I label but only a slight decrease in the M r as judged by autoradiography and Western blot analysis of the treated protein (data not shown). A hydrophobicity plot demonstrated that ISG 100 was extremely hydrophobic, mainly due to the hydrophobic nature of the repeat motif (Fig. 5B). Interestingly, the N-terminal region lacked an obvious hydrophobic leader sequence, while the short hydrophobic C-terminal domain was predicted to form a C-terminal transmembrane-spanning region (see "Discussion"). Although searches through the protein data bases did not reveal any significant homology with any other known sequences or motifs, certain aspects of the reiterative region, which represents the largest such coding sequence to be fully characterized in any species of trypanosome, are worthy of comment. First, the sequence was highly conserved with a total of 12 out of 17 of the residues of the motif, which included the central serine-rich core of 10 amino acids, being absolutely invariant throughout all of the 72 repeats. Second, sequence variability of the repeat motif was concentrated in the terminal three residues and, to a far lesser extent, in the second and fourth positions of each repeat (Fig. 4C). Third, amino acid substitutions in the repeat were characterized by a high degree of specificity and positional fidelity. For example, the only variable serine or leucine residues in the motif were always replaced by a cysteine or proline, respectively, while the terminal residue of each repeat was either threonine or alanine (Fig.   FIG. 4-continued  4C). Fourth, positional variations appeared to occur in blocks within the reiterative region, as best exemplified by substitution of methionine for isoleucine at position 15 of the repeat motif. Finally, both the positional and selective fidelity of codon usage of nucleotides in the 51-bp element of the repeat unit were highly conserved, as best demonstrated by the serine codon usage. Fig. 5A was employed as a calibration curve to estimate the amount of repeat units in samples of ISG 100 obtained by electroelution of the band from SDS-PAGE gels of immunoprecipitates of 125 I-labeled cells. These samples were subjected to ELISAs as described for the synthetic peptide, and a value of 32.8 Ϯ 6.6 ng of repeat/10 7 cells was obtained. Given molecular weights for the peptide and whole protein of 1554 and 126,439, respectively, it was calculated from this result that each cell contains 17,600 Ϯ 3,600 copies of ISG 100 , which is 3-fold lower than that reported for ISG 70 (19). Although this estimate was approximate since it assumes that the antibodies were primarily directed against the repeat motif as well as an identical response in ELISAs on the synthetic peptide and the whole protein, the low copy number was consistent with the low yield of purified protein and the low number of clones obtained from the screening of cDNA libraries compared with the more abundant ISG 70 .
Genomic Organization and Transcription of the ISG 100 Gene-Genomic DNA from T. brucei AnTat 1.1 was digested with several restriction enzymes, including PvuII. The pattern obtained was consistent with the view that the gene was present as a single copy (Fig. 6A). The results from zooblots demonstrated that the ISG 100 gene was also present in the subspecies T. brucei rhodesiense, and T. brucei gambiense, and the subgenus Trypanosoma evansi, but was absent in other related species, such as Trypanosoma congolense, Trypanosoma cruzi, and Crithidia fasciculata (data not shown). The stage-specific nature of the expression of ISG 100 was confirmed by Northern blots, which showed that a single transcript (ϳ4.4 kb) was present in bloodstream long slender and stumpy forms but was absent in procyclic forms (Fig. 6B).
Cellular Localization of ISG 100 -Antibodies against ISG 100 did not react with live cells, but the protein was readily detected using immunofluorescent antibody staining of fixed cells (Fig. 7A). Although the relative resolution of these experiments precluded an unequivocal cellular localization, it was clear that ISG 100 was not uniformly distributed over the surface of the cell, as observed for ISG 70 (19), but was concentrated as an intense spot in the posterior region of the cell close to the expected area of the flagellar pocket. This view was supported by the results of Western blots (Fig. 7B), which demonstrated an enrichment of ISG 100 in the flagellar pocket fraction as opposed to the pellicular surface membrane, as was also the case for ESAG7, a component of the heterodimeric transferrin FIG. 5. ELISA analysis of the repeat motif and hydropathy plot of ISG 100 . A, ELISA over a range of concentrations of the peptide corresponding to serine-rich repetitive motif (GIAASSLLSSFASSSAV) of ISG 100 using polyclonal antibodies against the purified ISG 100 (diluted 1/600). No signal above background was detected in the absence of anti-ISG 100 antibodies or with preimmune serum over the concentration range of peptide employed. The insert presents the ELISA response as function of antibody dilution at a fixed amount of peptide (5 ng/well). Each measurement represents the mean of triplicate determinations. B, hydropathy plot for ISG 100 from the predicted sequence of amino acids by the method of Kyte and Doolittle (30). The N-and C-terminal regions that flank the internal repeat are indicated by solid lines. receptor (5,7). The distribution of both of these proteins differed from the Ca 2ϩ -ATPase of the endoplasmic reticulum (TBA1) (32) and ISG 70 (19), which were localized predominantly in the endoplasmic reticulum and plasma membrane fractions, respectively. An unequivocal cellular localization of ISG 100 was obtained by immunogold electron microscopy ( Fig.  8) and high resolution immunofluorescence (Fig. 9). In addition to the flagellar pocket membrane and membranar-vesicular material present in the lumen of the flagellar pocket, gold particles were predominantly associated with the membrane and heterogeneous contents of a large electron-lucent, digestive vacuole and collecting tubules located in close proximity to the flagellar pocket (33). This cellular location was confirmed by high resolution immunofluorescence experiments on cells stained with 4,6-diamidino-2-phenylindole, to illustrate the position of the kinetoplast and nucleus (Fig. 9). Clearly, most of the fluorescence corresponding to ISG 100 was concentrated in a large perinuclear compartment but also extended, with decreasing intensity, in a posterior direction toward the flagellar pocket. Indeed the fluorescence distribution of ISG 100 was almost identical to that observed for certain host apoliproteins taken up by the endocytic pathway in trypanosomes (34). DISCUSSION A novel, bloodstream stage-specific, invariant surface glycoprotein from bloodstream forms of T. brucei, termed ISG 100 , has been identified, purified, and characterized. This protein differs significantly from all previously characterized surface proteins in trypanosomes (4). First, ISG 100 contains a large, internal domain that is composed entirely of a previously undescribed serine-rich, repetitive motif that represents 91% of the coding frame. Second, the protein is not uniformly distributed over the cellular surface but is specifically associated with elements of the endocytic pathway, including the flagellar pocket as well as intracellular vesicles and a large perinuclear, lysosomal-like vacuole. Third, ISG 100 is encoded by a single copy gene, whereas all other trypanosomal plasma membrane proteins characterized so far are either present as tandem repeats and/or part of multigene families (4).
In addition to procyclin (3), several other reiterative proteins have been identified in African trypanosomes. Most of these proteins are associated with the cytoskeleton, are not stagespecific, and are characterized by repeats with periodicities larger the repeat element in ISG 100 (35)(36)(37)(38). Although not fully characterized, two other membrane proteins containing internal repeats have been identified in T. brucei. One of these proteins, termed CRAM, was located in the flagellar pocket (39), while the other was present in the membranes of the vesicular network that underlies the flagellar pocket (40). Although these proteins share a similar cellular location to that observed for ISG 100 , both of these proteins contained shorter repeat units (8 or 12 residues) that were very hydrophilic and rich in acidic amino acids. Indeed, hydrophilic repeats containing a preponderance of acidic amino acids are typical features of reiterative domains in several parasitic taxa (41)(42)(43)(44)(45), and it has been suggested that they play a role in the perturbation of the immune response of the host (46,47). These considerations argue for an alternative functional role for ISG 100 , which may represent of a new class of reiterative proteins in parasitic protozoa containing hydrophobic rather than hydrophilic repeats.
There was a significant discrepancy between the size of ISG 100 predicted from the open reading frame (ϳ127 kDa) and the M r estimated from its electrophoretic mobility on SDS-PAGE gels (ϳ100,000) which was even more apparent after deglycosylation (ϳ55,000). A number of reasons suggest that this discrepancy was not due to proteolytic cleavage of the protein, as observed for the low density lipoprotein receptor in T. brucei (48). First, the purification was performed in the presence of a broad range of protease inhibitors. Second, only a single band (M r ϭ 100,000) was detected in Western blots of whole cells and in immunoprecipitations from detergent lysates prepared by boiling in SDS, and there was no indication of proteolytic degradation of the protein under these conditions. Third, this band contained both N-linked sugars and the 125 I label, and the amino acid sequence of the protein demonstrated that the only potential N-glycosylation sites and tyrosine to be radioiodinated were located in the N-terminal domain and at the C terminus, respectively. Thus, the 100-kDa band almost certainly represents the complete protein. The most likely explanation for the discrepancy was an aberrant electrophoretic mobility of ISG 100 on SDS-PAGE gels. Indeed, this is a feature of certain hydrophobic proteins, such as the lac permease which appears to bind relatively high amounts of SDS and has an unusually high electrophoretic mobility on SDS-PAGE gels (49).
The presence of three distinct domains in ISG 100 allows the construction of a possible model for the organization of the protein in the membrane. Although the short N-terminal nonrepetitive domain lacked an obvious hydrophobic leader sequence, this relatively hydrophilic region may represent an extracellular domain, since the effect of treatment with endoglycopeptidase F demonstrates that ISG 100 contains Nlinked carbohydrate. Significantly, this N-terminal region contains the only possible sites for N-glycosylation in the entire protein, and since N-glycosylation normally takes place only on the luminal side of the endoplasmic reticulum, this region is likely to be externally disposed (50). The hydrophobic nature of the repeat region suggests that this reiterative domain is associated with the lipid bilayer, since thermodynamic considerations make it unlikely that such a long hydrophobic stretch could be easily accommodated within an aqueous environment (51). Precisely how might the repeat motif be arranged in the plasma membrane? A classic ␣ helical membrane-spanning region with connector loops seems an implausible arrangement for the repeat motif, since it is probably too short (17 residues as opposed to the 25 or more required for such a configuration). Moreover, the presence of such a large number of consecutive membrane-spanning elements in a single protein would be unprecedented. An alternative possibility is that the repeat element assumes a form of monotopic configuration (52) with hairpin loops that are hydrophobically associated with the membrane but do not pass all the way through the bilayer. Another possibility is that this repetitive domain forms a membrane-associated ␤ strand structure as has been proposed for some well studied membrane proteins, for example the adenine nucleotide translocator (53,54). Whatever the arrangement of this region it seems reasonable to assume that the basic structural organization of the reiterative domain is determined by the invariant serine-rich, central 10-amino acid core of the repeat motif with the amino acid substitutions in the variable positions contributing to the folding of this core. The C-terminal domain consists of 32 residues and includes a hydrophobic stretch predicted to be sufficiently long to form a transmembrane-spanning region. In support of this view was the finding SCHEME 1. Schematic representation of the proposed topographic organization of ISG 100 in the plasma membrane. The possible organization of ISG 100 in the surface or vacuolar membrane (not to scale). The hydrophobic reiterative region is presented as a boxed region within the bilayer, while the proposed extracellular N-terminal domain, containing the two N-linked chains (Y), and C-terminal transmembrane spanning region, as well as the five N-and C-terminal amino acids are also shown.
FIG. 9. High resolution immunolocalization of ISG 100 . Fixed bloodstream forms were examined by immunofluorescent microscopy using antibodies against ISG 100 . After washing the cells were mounted in PBS containing glycerol (50%, v/v) and 4,6-diamidino-2-phenylindole (0.1 g/ml) to indicate the nuclear and kinetoplast DNA. The cells were viewed with a Zeiss Axioskop microscope equipped with a digital in situ imaging system. The position of the nuclear and kinetoplast DNA is indicated by arrows marked N and K, respectively, while the ISG 100 fluorescence associated with the perinuclear vacuole and flagellar pocket are indicated by the arrow and arrowhead, respectively. that the only tyrosine residue in the entire protein was located three residues from the C-terminal histidine. Clearly, this tyrosine must be accessible to the external medium since 125 I surface labeling was performed under conditions where only externally disposed tyrosine residues are iodinated (19). Thus, it seems reasonable to conclude that the C-terminal region consists of a transmembrane-spanning region with the terminal residues exposed to the external environment. Taken together, this leads to a topographic model for ISG 100 in which the internal repeat domain is associated with the lipid bilayer, possibly only on one side, and is flanked by a short extracellular N-terminal domain containing extensive N-linked carbohydrate chains and a transmembrane span with the extreme C-terminal residues exposed at the external surface (Scheme 1). This topological arrangement is consistent with the resistance of the protein to externally added proteases as well as the fact that the protein was not accessible to antibodies in intact cells but was readily radioiodinated.
The lack of a significant homology with any known protein sequences or motifs means that it was not possible to ascribe with certainty a function to this protein at present. Nevertheless, since expression of ISG 100 is restricted to the bloodstream form, it seems reasonable to assume that it is required for life in the mammalian host. The cellular localization of ISG 100 demonstrated that the majority of the protein was located in a perinuclear lysosomal-like vacuole, as well as smaller endosomal-like vesicles, and probably labeling with 125 I occurs only when the protein is present in the flagellar pocket membrane. Whether ISG 100 can cycle between these intracellular compartments and the flagellar pocket, as has been reported for another protein termed CB1-gp (55), remains to be established. Interestingly, CB1-gp shares a number of features in common with ISG 100 . For example this protein was also bloodstream stage-specific, contained extensive N-linked carbohydrate chains, and could be surface-labeled in whole cells while in the flagellar pocket by sulfo-sulfosuccinimidyl-6-(biotinamido)hexanoate-biotin. Whether both proteins are related is uncertain at present since no sequence data is available for CB1-gp but this glycoprotein appears to have a significantly higher M r (ϳ 180,000) than does ISG 100 .
Given its cellular location, it is tempting to speculate that ISG 100 may play a role in the pathways for endocytosis. However, it seems unlikely that ISG 100 is a receptor for ligand uptake, since only a small region of the protein may be actually exposed to the extracellular environment. Alternatively, in view of the repetitive nature of the protein, it seems more likely that ISG 100 may have a structural role in the compartments involved in intracellular traffic in T. brucei. Currently, gene knockout experiments, designed to take advantage of the fact that the gene is present as a single copy, are underway in an attempt to establish the function of this unusual surface protein.