Identification of a family of streptococcal surface proteins with extremely repetitive structure.

The group B Streptococcus (GBS) causes the majority of life-threatening bacterial infections in newborn children. Most GBS strains isolated from such infections express a surface protein, designated Rib, that confers protective immunity and therefore is of interest for analysis of pathogenetic mechanisms. Sequence analysis demonstrated that Rib has an exceptionally long signal peptide (55 amino acid residues) and 12 repeats (79 amino acid residues each) that account for >80% of the sequence of the mature protein. The repeats are identical even at the DNA level, indicating that an efficient mechanism operates to maintain a highly repetitive structure in Rib. The structure of Rib is similar to that of α, a previously characterized surface protein that is common among GBS strains lacking Rib. However, highly purified preparations of Rib and α did not cross-react immunologically, although the two proteins show extensive amino acid residue identity (47% in the repeat region). When analyzed in Western blots, Rib and α give rise to a regularly spaced ladder pattern, apparently due to hydrolysis of acid-labile Asp-Pro bonds in the repeats. We conclude that Rib and α are members of a novel family of streptococcal surface proteins with unusual repetitive structure.

The group B Streptococcus (GBS) 1 is the major cause of life-threatening bacterial infections in the neonatal period. Many children are exposed to this bacterium at birth, when they may be colonized by GBS present in the vaginal flora of the mother. In most cases such colonization does not cause disease, but a minority of newborns fall seriously ill after birth due to invasive GBS infection. Other children are born ill, due to an infection starting during the later part of the pregnancy (1).
Immunity to GBS infection may be elicited both by the polysaccharide capsule (1,2) and by different cell surface proteins (3)(4)(5)(6). Detailed studies of the capsule have shown that it occurs in several different serotypes that vary in their importance for human disease (1,2). Among the surface proteins that confer immunity, the first to be identified were two molecules designated ␣ and ␤ (3,7,8), which have been extensively characterized (9 -14). However, ␣ and ␤ are almost never expressed by strains of capsular type III, which cause the majority of invasive infections (1). In contrast, the recently identified protein Rib was found to be expressed by most strains causing lifethreatening infections, including almost all strains of serotype III (6). Thus, protein Rib is of considerable interest for the analysis of pathogenetic mechanisms in GBS infections and for vaccine development. This situation motivates detailed biochemical and immunological characterization of protein Rib.
Studies of protein Rib previously showed that it shares several properties with the ␣ protein (6). Both of these proteins are resistant to trypsin and vary greatly in size between different isolates of GBS. Moreover, the NH 2 -terminal sequences of Rib and ␣ were found to be related. These and other data suggested that the two proteins might be members of a family of proteins with related function. We have now sequenced the rib gene and compared it with the previously reported sequence of the ␣ gene, which is known to be very repetitive (14). In addition, highly purified preparations of the Rib and ␣ proteins have been characterized with regard to some immunochemical and biochemical properties. Our data show that Rib and ␣ define a family of bacterial surface proteins with unique repetitive structure.

EXPERIMENTAL PROCEDURES
Bacterial Strains and Cloning Vectors-The GBS strain BM110 (6) is a serotype III isolate obtained from Dr S. Mattingly (University of Texas, San Antonio, TX). Escherichia coli strain LE392 (Genofit, Geneva, Switzerland) was used as a host for the cloning vector EMBL3 (Promega Co., Madison, WI). For subcloning, E. coli strain XL1-Blue (which is recA1) (Stratagene, La Jolla, CA) was used as a host for the cloning vector pGEM7Z(fϩ) (Promega Co.), and the E. coli strain JM103 (Amersham Corp.) was used as a host for the sequencing vectors M13mp18 or M13mp19 (Amersham Corp.). Standard techniques were used for work with E. coli and cloning vectors (15).
Media, Chemicals, and Purified Proteins-GBS was grown in Todd-Hewitt broth, and E. coli was grown in LB broth at 37°C. Ampicillin (50 g/ml) and tetracycline (5 g/ml) were added when appropriate. Restriction enzymes were purchased from Promega Co., New England Biolabs Inc. (Beverly, MA) or Boehringer Mannheim.
The Rib, ␣, and ␤ proteins were purified from extracts of strains BM110, A909, and SB35, respectively, by a combination of ion exchange and molecular sieve chromatography (6), followed by a final step of hydroxylapatite chromatography for removal of small amounts of contaminating polysaccharides. 2 DNA Sequencing and Sequence Analysis-DNA sequences were determined by the dideoxy chain termination method using [␣-35 S]dATP (Amersham Corp.) and Sequenase 2.0 (Amersham Corp.). Recombinant M13mp18 or M13mp19 phage DNA was used as template. M13 universal primer and Ϫ40 primer (Amersham Corp.) as well as custom made primers were used. The sequencing reaction products were resolved on 8% polyacrylamide-urea gels. Gels were run at 40 W for 1-4 h on a sequencing unit from Cambridge Electrophoresis Ltd. (Cambridge, UK), fixed in 10% methanol, 10% acetic acid for 15 min, and dried on Whatman 3MM papers under vacuum. Computer-assisted analysis of DNA sequences was performed with the GCG software package (16) and the GeneWorks program (IntelliGenetics, Inc., Mountain View, CA).
Polymerase Chain Reaction Analysis-The rib gene was amplified from purified DNA in a 50-l volume using primers with the sequences 5Ј-TGACTAAAAATGTTCAGAATGGTAG-3Ј and 5Ј-GAAACAGATA-ATAAACCAACTGATG-3Ј. Each reaction mixture contained 12.5 pmol of each primer, 0.2 mM dNTPs, 2.5 units AmpliTaq DNA polymerase (Perkin-Elmer) and 1.5 mM MgCl 2 in the incubation buffer supplied with the enzyme. PCR amplification was performed by 30 repeated cycles on a programmable thermal controller (PTC-100, Promega Co.) with a thermal step program that included: denaturation at 94°C for 60 s, annealing at 57°C for 60 s, and primer extension at 72°C for 120 s. Amplified material was analyzed on 1.0% agarose gels.
Solid Phase Radioimmunoassay-Microtiter plates (Falcon 3912, Becton Dickinson, Oxnard, CA) were coated with purified protein Rib or ␣ by incubation for 16 h with 100 l of a solution (100 ng/ml) of protein in PBS (0.03 M phosphate, 0.12 M NaCl, pH 7.2). The wells were blocked by washing with veronal-buffered saline (10 mM veronal buffer, 0.15 M NaCl, pH 7.4) supplemented with 0.25% gelatin and 0.25% Tween 20. Rabbit antisera against the Rib and ␣ proteins (6) were used at dilutions corresponding to 50 -60% of maximal binding. The binding between anti-Rib and immobilized Rib and between anti-␣ and immobilized ␣ was inhibited by the addition of purified Rib or ␣. For these inhibition experiments 100-l aliquots of antiserum in PBSAT (PBS containing 0.02% NaN 3 and 0.05% Tween 20) were preincubated for 30 min with various amounts (160 pg to 500 ng) of Rib or ␣ and then added to the wells. After 3 h of incubation the wells were washed three times with PBSAT, and the presence of antibodies was analyzed by the addition of 125 I-labeled protein G (20,000 cpm in 100 l/well) and incubation for 2 h. After three washes with PBSAT, the radioactivity of each well was determined in a ␥-counter. Nonspecific binding (less than 1%) was determined in wells coated with buffer (PBS) alone. All incubations were performed at room temperature.
Other Methods-SDS-PAGE was performed using a Protean II cell (Bio-Rad). The gels were stained with Coomassie Brilliant Blue R-250 or transferred by electroblotting to Immobilon filters (Millipore Corp., Molsheim, France) in a Semi-Dry Electroblotter (Ancos, Vig, Denmark). Tricine gels were used for the analysis of peptide fragments (17). For Western blot analysis, membranes were incubated with antisera as described (6). Amino-terminal sequence analysis of proteins transferred to ProBlott membranes was performed with a 470A Protein Sequencer (Applied Biosystems, Foster City, CA).

RESULTS
Cloning and Sequence Analysis of the rib Gene-The rib gene was cloned from the type III strain BM110, a member of a putative high virulence clone of GBS (18). The sequence of the entire rib gene and the deduced amino acid sequence of the Rib protein are shown in Fig. 1. Comparison of this sequence with the previously determined NH 2 -terminal sequence of Rib demonstrated that the signal sequence has a length of 55 amino acid residues. A region with 12 identical repeats (each with a length of 79 amino acid residues) and a partial repeat (15 amino acid residues) accounts for Ͼ80% of the sequence of the mature protein. As described below, the repeats are apparently identical even at the DNA level. The processed form of protein Rib has a length of 1176 amino acid residues and a predicted molecular mass of 123 kDa.
The highly repetitive structure of the rib gene caused considerable difficulties during the sequencing work. Because these problems are of general interest with regard to the sequencing of repetitive genes, they will be briefly summarized below.
Initially, a EMBL3 clone expressing protein Rib was isolated and used to construct the subclone pGRib105. Preliminary sequence analysis of pGRib105 allowed the identification of the 5Ј and 3Ј ends of the rib gene. Analysis of the central part of the gene showed that partial digestion with BglII gave rise to a regular ladder pattern on agarose gels, indicating the existence of repeated sequences containing BglII sites (data not shown). Sequence analysis indeed demonstrated the presence of repeats corresponding to 79 amino acid residues. This initial analysis indicated that Rib has a highly repetitive structure, as previously reported for the ␣ protein of GBS (14).
To further characterize the repeat region, PCR analysis was performed, allowing amplification of the whole rib gene. For chromosomal DNA, the main PCR product had a size of ϳ3,400 bp, corresponding to a rib gene with 12 repeats. However, the pGRib105 subclone generated a main PCR band of ϳ2,700 bp, corresponding to a rib gene with 9 repeats, implying that part of the repeat region had been lost during the initial cloning in the vector. An interesting observation made during the PCR analysis was that the PCR product not only contained the main band but also gave rise to a ladder of bands with a size difference of ϳ237 bp, corresponding to one repeat (Fig. 2). This ladder could be the result of slippage of Taq polymerase during replication, due to the unique repetitive structure of the rib gene.
Based on the results of the PCR analysis, new attempts were FIG. 1. Nucleotide sequence of the rib gene from strain BM110 and deduced amino acid sequence. The sequence is divided into a 5Ј part, a central part with 12 identical repeats and a partial repeat, and a 3Ј part. The box indicates a possible ribosomal binding site. The vertical arrow indicates the end of the signal sequence. The dashed line indicates the NH 2 -terminal sequence determined for protein Rib from strain BM110; this NH 2 -terminal sequence differs from that of Rib isolated from another strain (6) at one position. 2 The horizontal arrows indicate the position of the repeats as well as of a partial repeat. The sequence data have been submitted to the GenBank data base (accession number U58333). made to clone the entire rib gene in E. coli. Because it seemed possible that Rib had a toxic effect on E. coli, the rib gene was cloned without the promoter and signal sequence regions. Appropriate fragments of chromosomal DNA from strain BM110 were cloned directly into the pGEM7Z(fϩ) vector, generating clone pGRib116. Initial analysis of this clone showed that it contained a repeat region of the same size as the chromosomal rib gene. However, further analysis of pGRib116 indicated that the repeat region in this clone was highly unstable, although it was maintained under Rec Ϫ conditions and not expressed (data not shown). Because the entire repeat region of the rib gene could not be stably maintained in E. coli, it was not possible to analyze the sequence of this region with standard methods.
To analyze the sequence of the repeat region, we chose to sequence individual repeats cloned at random. As described above, our analysis of the rib gene had indicated that all repeats contained a unique BglII site. We therefore cloned fragments obtained by BglII digestion of plasmid pGRib116, assuming that they would be representative of the whole repeat region. A total of 13 repeats were analyzed, and all of them were found to have identical nucleotide sequences. The conclusion that all repeats are identical was further supported by analysis of sequences at the extremities of the repeat region. The 5Ј half of the first repeat (up to the BglII site) and the 3Ј half of the last complete repeat (downstream from the BglII site) together formed a repeat whose nucleotide sequence was identical to that of repeats recovered after BglII digestion. In addition, the partial repeat (coding for 15 amino acid residues) had a nucleotide sequence identical to the corresponding region in the complete repeats.
Comparison between the Rib and ␣ Proteins-Previous studies have shown that the ␣ protein of GBS has a very repetitive structure with long repeats that are identical even at the DNA level (14), as reported here for protein Rib. As shown in Fig. 3, these two surface molecules of GBS exhibit extensive amino acid residue identity. The signal sequences show 80% residue identity and are unusually long: 55 residues in protein Rib (Fig.  1) and 56 residues in the ␣ protein (6). In the non-repeated  (14). The two vertical arrows indicate the ends of the signal sequences (6). The repeat regions are shown in the shaded box. Only one full repeat from each protein is shown, followed by the partial repeat. B, overall structure of Rib from strain BM110 and ␣ from strain A909 and degree of amino acid residue identity between different regions of the proteins. S, signal peptide; N, NH 2 -terminal region; R, one repeat; P, partial repeat; C, COOH-terminal region. The number of amino acids in each region is indicated. The Rib protein has 12 repeats of 79 amino acids, and the ␣ protein has 9 repeats of 82 amino acids. NH 2 -terminal parts of the mature proteins (174 and 170 residues, respectively), the degree of residue identity is 61%. The repeats (79 and 82 residues, respectively) show a somewhat lower degree of residue identity, 47%. The short COOHterminal regions of the two proteins are almost identical and have the characteristics of cell wall attachment regions in surface proteins of Gram-positive bacteria, including an LPXTG sequence (19).
The Rib and ␣ proteins have an unusually high content of Asp, Val, Thr, Pro, and Lys, which together account for about 60% of the amino acid residues in each protein. Computerassisted analysis indicated that the Rib and ␣ proteins are highly acidic, with isoelectric points of 4.3 and 4.5, respectively. Analysis of the protein sequences by protein structure algorithms (Ref, 16 and the GeneWorks program) predicted a high ␤-sheet content in each protein, including the repeat regions.
Immunological Relationship between the Rib and ␣ Proteins-The Rib and ␣ proteins were previously found to be immunologically unrelated, when analyzed with specific rabbit antisera in Western blots and Dot-blots (6). However, the extensive sequence homology between the two proteins suggested that a cross-reactivity might be detected if more sensitive methods were used. To analyze this possibility, inhibition tests were performed (Fig. 4). The reactivity between Rib, immobilized in microtiter plates, and anti-Rib serum was inhibited by pure protein Rib, but the addition of the ␣ protein did not cause any inhibition even when a large excess was added (Fig. 4A). Similarly, the reaction between ␣ and anti-␣ serum was inhibited by purified ␣ protein but not by protein Rib (Fig. 4B). These results indicate that the large majority of antibodies directed against Rib or ␣ completely lack reactivity for the heterologous antigen.
Aberrant Migration Behavior of the Rib and ␣ Proteins in SDS-PAGE-An unusual feature of Rib and ␣ is their behavior in SDS-PAGE gels, where the apparent molecular mass of each protein was found to vary depending on the acrylamide concentration of the gel (Fig. 5A). At an acrylamide concentration of 5% the major polypeptide species in the Rib and ␣ protein preparations migrated at positions corresponding to molecular masses of about 178 and 166 kDa, respectively (Fig. 5B), but at an acrylamide concentration of 10% the apparent molecular masses were approximately 107 and 111 kDa, respectively (Fig.  5C). According to the deduced amino acid sequences, the predicted molecular masses of the mature Rib and ␣ proteins are 123 and 103 kDa, respectively. Unlike Rib and ␣, the group B streptococcal ␤ protein, an IgA-binding surface protein that is structurally unrelated to the Rib and ␣ proteins and lacks long repeats (11,12), had the same apparent molecular mass in the different SDS-PAGE gels (Fig. 5).

Analysis of Ladder Patterns Generated by the Rib and ␣ Proteins in SDS-PAGE: Evidence for Hydrolysis of Acid-labile
Asp-Pro Bonds-It has previously been reported that bacterial extracts containing the ␣ protein give rise to a regular ladder pattern in immunoblotting experiments, indicating that the ␣ protein is size heterogeneous (20). Interestingly, the distance between the ladder steps was found to correspond to one repeat, suggesting that the different molecular species in the ladder represented polypeptides with different number of repeats (14). A similar ladder pattern was also observed in Western blots of the Rib protein (6). It was suggested that this size heterogeneity could be the result of early termination of translation, RNA-mediated self cleavage, acid hydrolysis, or protease activity (14). A repetitive protein from the salivary glands of Chironomus tentans has also been shown to form a regular ladder pattern in Western blots, and it was suggested that the heterogeneity reflects a degradation that occurs naturally in the salivary glands (21). It was therefore of interest to analyze the mechanism that generates such ladder patterns.
Analysis of the sequences of the Rib and ␣ proteins suggested that the ladder pattern might be due to hydrolysis of Asp-Pro bonds, which are found in the repeats of both proteins (Fig. 6D). It is known that such bonds are sensitive to acid hydrolysis (22). To analyze whether acid-labile sites are responsible for the ladder pattern, purified preparations of the Rib and ␣ proteins were first analyzed under standard conditions (Fig.  6A). Under these conditions, the ladder pattern was seen in blots but not in stained gels, indicating that only a small fraction of the purified proteins were of lower molecular weight and gave rise to the ladder (Fig. 6A). Next, the purified Rib and ␣ proteins were incubated at pH 4.0 at 37°C for 16 h before analysis. The resulting preparations were either boiled directly in sample buffer or neutralized before boiling in sample buffer. When these preparations were analyzed by SDS-PAGE, the analysis showed that distinct ladder patterns, readily detectable also in stained gels, were formed when the proteins has been boiled for 5 min in sample buffer at acidic pH (Fig. 6B). However, only a minor degradation was detected in the samples that had been neutralized before the analysis (data not FIG. 4. Immunological relationship between the Rib and ␣ proteins, analyzed by solid phase radioimmunoassay. Highly purified preparations of Rib or ␣ were immobilized in microtiter wells and allowed to react with rabbit antibodies to the corresponding protein.   5. Analysis of the apparent molecular mass of the purified Rib, ␣, and ␤ proteins. A, relationship between acrylamide concentration and apparent molecular mass in SDS-PAGE. B and C, stained SDS-PAGE gels of purified Rib, ␣, and ␤ proteins analyzed at acrylamide concentrations of 5 (B) and 10% (C). The preparations of Rib and ␣ give rise to one major band and one minor band, as described previously (6). The molecular mass was determined for the major band. Molecular mass markers (in kDa) are shown to the right in each gel.
shown). Thus, the ladder patterns were largely due to fragmentation during boiling in non-neutralized sample buffer (Fig.  6B). The Rib and ␣ proteins were further degraded when the samples were boiled at acidic pH for a longer period (15 min), as detected in a stained Tricine gel (Fig. 6C). In contrast, the group B streptococcal ␤ protein, which does not contain Asp-Pro sequences (11,12), was not degraded at acidic pH (Figs. 6,  B and C). The repeats in the Rib protein contain two Asp-Pro sites (Fig. 6D), which may explain why this protein gives rise to doublet bands (Fig. 6B).
To further analyze the formation of the ladder, bands generated by the Rib and ␣ proteins at acidic pH were subjected to NH 2 -terminal sequence analysis. Bands analyzed included those labeled a-d in Fig. 6, B and C, as well as polypeptides of higher molecular weight. All bands analyzed had sequences identical to the NH 2 -terminal sequences of the mature proteins, i.e. AEVIS for the Rib protein and STIPG for the ␣ protein (Fig. 6D). These data may be explained by assuming that acid hydrolysis occurred at all Asp-Pro sites in the Rib and ␣ proteins, except the most NH 2 -terminally located site in each protein. Cleavage at this site would have given rise to a short NH 2 -terminal fragment, which was not detected.
Although the data reported above suggest that the ladder pattern observed for the Rib and ␣ proteins is generated by cleavage of Asp-Pro bonds, cleavage of such bonds would be expected to generate both NH 2 -terminal and COOH-terminal fragments as well as internal peptides generated by hydrolysis of Asp-Pro sites in the repeats (7.2-and 1-kDa peptides from the repeats of protein Rib and an 8.7-kDa peptide from the repeats of the ␣ protein). Surprisingly, neither COOH-terminal fragments nor internal peptides were found, indicating that these peptides had been further degraded or lost during the analysis (Fig. 6C). Interestingly, the ladder pattern formed by the salivary gland protein from C. tentans also showed the absence of internal peptides corresponding to single repeats (21). DISCUSSION In this report we describe the nucleotide sequence of the streptococcal surface protein Rib and show that Rib together with the ␣ protein of GBS define a novel family of bacterial surface proteins. The most remarkable feature of Rib and ␣ is the presence of an extensive repeat region, in which the repeats are identical even at the DNA level, as first reported for the ␣ protein (14). The presence of this repeat region can explain why the Rib and ␣ proteins expressed by different clinical isolates of GBS show size variation (6,20), which could arise through recombination or slipped strand mispairing during DNA replication. If size variation occurred frequently, it might also explain why the repeats are identical, because frequent expansions and contractions of the repeat region could eliminate divergent repeats that arise through mutation. Such frequent variation in size has been reported for a gene with short identical repeats in Ureaplama urealyticum (23). However, protein Rib expressed by a given clinical isolate of GBS shows great stability in size, at least under laboratory conditions (6). This finding indicates that the size variation between Rib molecules expressed by different strains of GBS is due to rare events and makes it difficult to explain the identity between the repeats through a model with frequent expansions and contractions in the repeat region. In addition, expansions and contractions would be expected to generate divergent repeats at the extremities of the repeat region, as described for the M6 protein of Streptococcus pyogenes (24). However, all repeats within the rib gene, and also within the ␣ gene, are apparently identical. A possible alternative explanation for the identity of the repeats could be the existence of base pairing within RNA secondary structures, which might limit mutation frequencies (25). However, it is clear that mutations can accumulate in the repeat region, because the repeats of Rib and ␣ must have evolved from a common predecessor (Fig. 3). Taken together, these data suggest that any mutation that occurs in a repeat is either rapidly eliminated or rapidly spread to the other repeats, by unknown mechanisms.
With regard to long repeats that are identical also at the nucleotide level, it should be noted that such repeats have also been described for genes encoding human apolipoprotein(a) (26) and the salivary protein from C. tentans referred to above (21). In addition, several genes of the mucin family have highly conserved repeat regions (27), and the gene for porcine submaxillary gland apomucin contains long repeats that are identical also at the nucleotide level (28). For the Rib and ␣ proteins, it seems possible that the repeat regions confer unusual physico-chemical properties on the proteins and contribute to their protease resistance (6,13) and aberrant migration behavior in SDS-PAGE (Fig. 5).
The Rib and ␣ proteins form a regular ladder pattern in Western blots, where the distance between the steps in the ladder was found to correspond to the size of one repeat (6,14,20). A similar ladder pattern was reported for the repetitive salivary glycoprotein from C. tentans (21). It was assumed that this size heterogeneity represents a true heterogeneity in the proteins, and various mechanisms have been proposed that could generate the ladder pattern (14,21). However, the anal- ysis of the Rib and ␣ proteins reported here suggests that the apparent size heterogeneity may at least partially be due to hydrolysis, during the analysis, of acid-labile Asp-Pro bonds in the repeats. In agreement with this explanation, Asp-Pro bonds are present not only in the repeats of the Rib and ␣ proteins but also in the C. tentans protein (21). Thus, it remains an open question whether these proteins really are size heterogeneous in vivo. However, it should be noted that the normal habitat of GBS is the human vagina, where the normal pH is less than 4.5 (29), a condition that could favor some hydrolysis of Asp-Pro bonds also in vivo and cause release of biologically active polypeptides from the bacterial cell.
The Rib and ␣ proteins show 61% residue identity in the unique NH 2 -terminal regions and 47% identity in the repeat regions (Fig. 3). Within these regions, shorter sequences with even higher degree of homology are also present, underlining the close evolutionary relationship between Rib and ␣. It was therefore surprising that experiments reported previously had indicated that the two proteins are immunologically unrelated (6). However, the inhibition experiments reported here confirmed the results previously obtained (Fig. 4). This result suggests that the sequences that are similar in Rib and ␣ are not immunogenic or represent epitopes that are hidden in the intact proteins.
In summary, the Rib and ␣ proteins of the group B Streptococcus have been shown to be members of a family of bacterial surface proteins with remarkable repetitive structure. Further characterization of this protein family is of interest for studies of genes with highly repetitive sequence, which are common both in bacteria and in man. In addition, the knowledge that is now available about Rib and ␣ will permit studies of their biological function and definition of protective epitopes. Such studies are of interest with regard to the mechanisms of pathogenesis used by Gram-positive bacteria and may also contribute to the development of a protein vaccine against GBS disease.