The protozoan Trypanosoma cruzi has a family of genes resembling the mucin genes of mammalian cells.

Mucins are heavily O-glycosylated Thr/Ser/Pro-rich molecules. Given their relevant functions, mucins and their genes have been mainly studied in higher eukaryotes. In the protozoan parasite Trypanosoma cruzi, mucin-like glycoproteins were shown to play an important role in the interaction with the surface of the mammalian cell during the invasion process. We show now that this parasite has a family of putative mucin genes, whose organization resembles the one present in mammalian cells. Different parasite isolates have different sets of genes, as defined by their central domain. Central domains, rich in codons for Thr and/or Ser and Pro residues, are made up of either a variable number of repeat units in tandem or non-repetitive sequences. Conversely, 5′- and 3′-ends from different genes in different isolates have similar sequences, suggesting their common origin. Comparison of deduced amino acid sequences revealed that all members of the family have the same putative signal peptide on the N terminus and a putative sequence for glycophosphatidylinositol anchoring on the C terminus. The deduced molecular mass of the core proteins is small (from 17 to 21 kDa), in agreement with the 1-kilobase size of the mRNA detected. Putative mucin genes in T. cruzi are located on large chromosomal bands of about 1.6-2.2 megabase pairs.

Mucins are highly glycosylated proteins expressed by most secretory epithelial tissues in vertebrates. They consist of a core protein moiety where a number of carbohydrate chains are attached to serines and threonines by ␣-1-3 O-glycosidic bonds (1). The complex structure of these glycoproteins made the identification of genes encoding the protein moiety more difficult. However, several MUC-like genes have been isolated recently due to the fact that they have a defined basic structure and sequence, which allows their inclusion in a gene family. MUC-like genes in vertebrates are essentially composed of a central domain and 5Ј-and 3Ј-flanking sequences (2,3). The central domains, comprising up to 70% of the coding sequences, are composed by tandemly repeated units enriched in codons for Ser and Thr, which are the target sites for O-glycosylation in the protein product, as well as Pro residues (4). Sequences flanking the central domain, on the 5Ј-and 3Ј-ends of mucins genes, lack repeated sequences.
The percentage of amino acid identities among different mucin core proteins are low. No substantial identities were found among the repeats in different molecules. They are unique in size and sequence for each member of the mucin family, even though they contain many Ser and Thr residues, suggesting that their only function is to serve as a scaffold for O-linked glycans (3). Furthermore, different individuals have a variable number of repeated units in homologous core proteins, making the loci coding for mucins highly polymorphic among individuals (5,6). Partial sequence identities were found in defined regions of the N and C termini. For example, significant identities were observed between the deduced amino acid sequence from MUC2 and putative MUC5 human mucins (7), the porcine and bovine submaxillary mucins (8,9), and the cysteine-rich C-terminal regions of rat intestinal mucin-like and human MUC2 peptides (10 -12).
Genes encoding molecules that have mucin-like features in lower eukaryotes have been detected in Leishmania major (13) and Trypanosoma cruzi (14). Particularly in T. cruzi, the ethiological agent of Chagas disease, much work has been done on the biochemical and functional characterization of mucin-like surface glycoconjugates (15). These heavily O-glycosylated molecules are Thr-, Ser-, and Pro-rich and are attached to membrane by glycophosphatidylinositol anchor (16). Mucins in T. cruzi are the major acceptors of sialic acid in a reaction catalyzed by trans-sialidase (15,16). Recent evidence suggests that these molecules are involved in the cell invasion process, probably mediating adhesion of the parasite to the mammalian cell surface (17,18). We have previously identified a putative mucin gene in T. cruzi (14) having a small size and encoding five repeat units with the consensus sequence T 8 KP 2 . In this work, we show that T. cruzi has, in fact, a putative mucin gene family resembling the one present in vertebrate cells. Their members have a Thr/Ser/Pro-rich central domain, which might or might be not organized in repetitive units, and highly conserved nonrepetitive flanking domains.
Oligonucleotide Sequences-Sequences are as follows: P1, 5Ј-CCAT-GTTCCTCACTGTGTAGAT-3Ј; P2, 5Ј-ACATCGGACCACGGTAGAAG-* This work was supported by grants from the United Nations Developmental Program/World Bank/World Health Organization Special Program for Research and Training in Tropical Diseases; the International Atomic Energy Agency (Vienna, Austria); the Ministerio de Cultura y Educación de la Repú blica Argentina; the Universidad de Buenos Aires, Argentina; and Fundación Antorchas, Argentina. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
The nucleotide sequence(s) reported in this paper has been submitted to the GenBank TM /EMBL Data Bank with accession number(s) U32572, U32346, U32447, U32448, U32449, and L20809.
‡ Fellow from the Consejo Nacional de Investigaciones Científicas y Técnicas, Argentina.
§ Researchers from the Consejo Nacional de Investigaciones Científicas y Técnicas, Argentina.
Nucleic Acid Purification-DNA was prepared from epimastigotes of T. cruzi using conventional Proteinase K, phenol/chloroform method (24). RNA from the epimastigote and trypomastigote forms was prepared using TRIzol (Life Technologies, Inc.) according to manufacturer indications.
Southern and Northern Blots-DNA was digested with the indicated restriction enzymes, and fragments were separated by electrophoresis in 2% agarose gel, transferred by capillarity (24) to Hybond-N nylon membrane (Amersham, Buckinghamshire, United Kingdom), and UV cross-linked using a Stratagene UV cross-linker. RNA was fractionated in a 1.5% agarose gel as described elsewhere (25), blotted and crosslinked as above. Pulse field gel electrophoresis was performed as previously described (26). Filters were hybridized with the different probes described below, using an hybridization solution containing 0.5% SDS, 5 ϫ Denhardt's solution, 100 g/ml salmon sperm DNA, 3 ϫ SSC, and washed at 65°C in 0.1 ϫ SSC, 0.1% SDS.
DNA Cloning-Clones MUC.CA-1, -2, and -3 were obtained from CA1/72 cloned stock of T. cruzi as follows. Total DNA was digested with SacI, electrophoresed in a 0.8% agarose gel, transferred to a nitrocellulose filter, and hybridized with a probe containing the central repetitive region of the MUC.M/76 gene previously reported (14). This probe detected a double band around 2.5 kilobase pairs. DNA fragments around this size were gel purified and cloned into pBluescript KS(ϩ) vector (Stratagene).
MUC.RA-1 and -2 clones were obtained from T. cruzi RA strain by PCR 1 using oligonucleotides P1 and P2. PCR was performed using Vent DNA Polymerase (New Englands BioLabs) according to manufacturer indications, with an annealing temperature of 60°C. The PCR 1 product was a single band of 800 base pairs that was cloned into EcoRV-digested pBluescript KS(ϩ) vector.
Radioactive Probes-Probe "repeats" from MUC.M/76 were made by PCR using oligonucleotides R3Ј and R5Ј to amplify only the repetitive region of the gene, with an annealing temperature of 60°C. 3Ј-MUC.M/76 probe was made by PCR using oligonucleotides Mu1 and P2 over MUC.M/76 clone, with an annealing temperature of 60°C. Fragment encompassing from oligonucleotide P1 to the stop codon of the coding sequence of MUC.CA-2 clone was radiolabeled in a primer extension reaction using oligonucleotide P1 as primer. For Northern blot hybridization, this probe was radiolabeled using random priming method (24). Central MUC.RA-2 probe was made by primer extension over AccI-digested MUC.RA-2 clone, using oligonucleotide RA3 as primer, with annealing temperature of 60°C.
DNA Sequencing-Fragments cloned in pBluescript KS(ϩ) were sequenced using different primers by the dideoxy chain termination method (27) using the Sequenase 2.0 kit (U. S. Biochemical Corp.). Template was either double strand or single strand obtained from pBluescript KS(ϩ) using helper phage M13KO7.

A Family of Putative Mucin Genes Differing in the Number of
Repetitive Units-Southern blot analysis of T. cruzi DNA probed with a gene having a mucin-like structure revealed several bands (Fig. 1, panel 2), suggesting the presence of more than one gene in the T. cruzi genome. To study this gene family, a cloned stock of T. cruzi (CA1/72) was used. 12 positive clones were identified by hybridization with a probe containing the nucleotide tandem repeats ("repeats" probe) present in a putative mucin gene (14). Three groups of clones were detected according to the size of the inserts. One clone of each group was selected for sequencing and named MUC.CA-1, MUC.CA-2, and MUC.CA-3. The deduced amino acid sequence from one of them (MUC.CA-3) is shown in Fig. 2 and compared with the sequence deduced from the gene previously isolated from the Miranda/76 clone (14). The deduced sequences of the two other clones (MUC.CA-1 and MUC.CA-2) were almost identical to those of MUC.CA-3 (see below). All three MUC.CA-deduced sequences showed a central domain made up of tandemly repeated units and non-repetitive flanking domains on the N and C termini. The repetitive units were very similar in all clones analyzed, its consensus sequence being T 8 KP 2 . However, the deduced sequences differ in length due to varying numbers of repeat units, being 4, 7, and 10 (including a final imperfect 1 The abbreviation used is: PCR, polymerase chain reaction. repeat) for MUC.CA-1, -2, and -3, respectively. The non-repetitive N-and C-terminal domains were almost identical in all three MUC.CA clones and very similar to the flanking sequences in the original Miranda/76 clone (Fig. 2). Southern blot experiments using restriction endonuclease enzymes that trim genes around the repeats allowed us to estimate that the number of genes containing these repeats in the parasite genome is from three to five (data not shown).

Members of the Putative Mucin Gene Families Might Largely Differ in Sequence-
To know if genes containing repeats were widely distributed among T. cruzi parasites, several strains and clones were analyzed. Homologous sequences to the repeats present in MUC.CA genes were detected in a second parasite clone (SylvioX-10/7) but not in others (RA stock and CL-Brener clone) (Fig. 1, panel 1). In fact, only 4 out of 13 strains and clones of T. cruzi tested showed sequences homologous to the repeated domain (data not shown). However, the complete MUC.CA gene probe, which contains the 5Ј-and 3Ј-flanking sequences in addition to the repeats, revealed several bands in the strains previously mentioned (Fig. 1, panel 2) and in another 9 strains and clones of T. cruzi tested (data not shown). Patterns compatible with a gene family were observed in all strains analyzed.
These results suggest that different isolates and clones of T. cruzi might have related sequences sharing 5Ј-and 3Ј-ends but differing in their central regions. To test this possibility, one strain of T. cruzi (RA), which did not hybridize with the repeated region of the MUC.CA genes, was analyzed. 18 recombinant clones were obtained by PCR as described under "Experimental Procedures." Two of them, named MUC.RA-1 and MUC.RA-2, were selected for sequencing since they cross-hybridized weakly in Southern blot experiments. Comparison of their deduced amino acid sequences revealed two highly homologous regions having 78 and 81% of identity in the N and C termini, respectively, and two degenerate repeats similar to those present in MUC.CA sequences (Fig. 2). Between these two conserved domains, both MUC.RA-1 and -2 genes lack repetitive units and differ almost completely. These results indicate that a single parasite might have a family composed of highly divergent members. Furthermore, since MUC.RA and MUC.CA sequences greatly differ and are specific of each parasite stock, it might be proposed that different parasites have a different putative mucin gene family.
Chromosomal Localization and RNA Blot Analysis of Puta-tive Mucin Genes-Filters containing T. cruzi chromosomal bands fractionated by size using pulse field gel electrophoresis were hybridized with either the 3Ј-end or the "repeats" probe of MUC.M/76 sequences. The conserved 3Ј-region revealed one or two bands in all of the strains analyzed (Fig. 3, panel 2), while the "repeats" probe only revealed bands in the CA1/72 cloned stock (Fig. 3, panel 1). These results are in agreement with the idea that central domains in the putative mucin gene family greatly differ among parasites while flanking sequences are conserved. In all cases, bands were between the 1.6-and 2.2megabase pair markers, showing that this gene family is not dispersed throughout the genome but restricted to few chromosomal bands. An interesting result from these hybridizations was observed in the CA1/72 stock. While the 3Ј-probe hybridized with a unique band (Fig. 3, panel 2), the "repeats" probe lit up an additional band (Fig. 3, panel 1). This observation suggests that some genes with tandem repeats do not have conserved flanking sequences, at difference with the pattern described in the MUC.CA and MUC.RA sequences studied. This might further increase the number of variants in the T. cruzi putative mucin gene family. Northern blots were carried out to determine the size of the transcripts. A complete MUC.CA-2 probe detected a broad band around 1 kilobase in the four parasite strains and clones tested ( Fig. 4 and data not shown). The epimastigote and cellderived trypomastigote parasite forms showed the same pattern of bands. General Structure of the Putative Mucin Deduced Amino Acid Sequences-A schematic representation of the structure of MU-C.RA and MUC.CA amino acid deduced sequences and their percentage of identity is shown in Fig. 5. Interestingly, domains similar to those present in mammal mucins can be identified, including a putative signal peptide, a non-repetitive mature N terminus, a central domain rich in Thr-Ser-Pro residues, and the C-terminal non-repetitive domain. The deduced sequences on the N termini are very similar in all MUC.CA and MUC.RA sequences studied, comprising the first 20 -25 amino acids in the different members. This region has the three common structural features present in signal sequences (28): an n-region with net positive charge, located in the first 10 or 5 amino acids of MUC.CA and MUC.RA, respectively; then follows a central hydrophobic h-region, made up of 10 amino acids, 6 of which are Leu and 2 of which are Val residues; and finally a polar c-region 5 amino acids long, 3 of which are Cys residues (Fig. 2). Since all described mucins in T. cruzi are located on the parasite surface, it is likely that the first 20 -25 amino acids in MUC.CA and MUC.RA products constitute indeed signal peptides. Furthermore, this region is 57% identical to the 21 amino acids in the signal peptide of the human MUC 18, a melanoma antigen gene with sequence homology to neural cell adhesion molecules of the Ig superfamily (29).
After the putative signal peptide, MUC.CA and MUC.RA members have a short N-terminal extension in the mature protein, which, in MUC.CA members, can be defined by the 11 amino acids just before the first repeat unit. Then follows the central domains that are rich in Thr (from 16 to 74%), Ser (from 0 to 12%), and Pro (from 10 to 17%) residues. Given the mucinlike structure of the deduced proteins, a computational method for prediction of O-glycosylation of mammalian proteins was used (30). Predicted O-glycosylated residues provided by this service (NetOglyc@cbs.dtu.dk) are 64 for MUC.CA-2, 87 for MUC.CA-3, 50 for MUC.M/76, 30 for MUC.RA-1, and 21 for MUC.RA-2. Further work will clarify the O-glycosylation status of these gene products, which is at present unknown. In the case of MUC.CA-deduced sequences, central domains are entirely made up of repeat units. In MUC.RA sequences, most of the central domains lack repetitive units, but two degenerate repeats are present close to the C termini.
Finally, MUC.CA and MUC.RA members have a C-terminal domain comprising the last 54 amino acids, which is almost identical in all sequences studied (Fig. 5). This region has only 1 or 2 Cys residues in the different members, at variance with the Cys-rich terminal domains present in mammalian secretory mucins (10,12). The last 16 amino acids are 75% rich in hydrophobic residues and lack a polar cytoplasmatic tail, suggesting that this extension might be related to a glycophosphatidylinositol anchor addition (31).
The overall organization of these putative mucin genes and gene family in T. cruzi somewhat resembles that of mammalian cells. MUC.CA and MUC.RA members all share sequences on the N-and the C-terminal domains, suggesting a common origin. Between these two regions, MUC.CA and MUC.RA members diverged almost completely. However, the two degenerate repeats remaining in MUC.RA genes raise the possibility that they were in fact originated from genes made up of perfect repeat units. In this context, a human airway mucin contains a virtually perfect 87-base pair tandem repeat, but numerous deletions or insertions resulting in many frameshifts destroy the repetitive structure in the coded peptide (32). A somewhat related precedent of conserved sequences flanking variable tandem repeats in a protozoan was described for two genes encoding S-antigens of Plasmodium falciparum, the causative agent of malaria. These genes are homologous over their N and C termini and even over the flanking non-coding regions, but their central regions are formed of so different repeats that they do not even cross-hybridize on Southern blots performed at low stringency (33).