Carbohydrate-induced Differential Gene Expression Patterns in the Hyperthermophilic Bacterium Thermotoga maritima * 210

The hyperthermophilic bacteriumThermotoga maritima MSB8 was grown on a variety of carbohydrates to determine the influence of carbon and energy source on differential gene expression. Despite the fact that T. maritima has been phylogenetically characterized as a primitive microorganism from an evolutionary perspective, results here suggest that it has versatile and discriminating mechanisms for regulating and effecting complex carbohydrate utilization. Growth ofT. maritima on monosaccharides was found to be slower than growth on polysaccharides, although growth to cell densities of 108 to 109 cells/ml was observed on all carbohydrates tested. Differential expression of genes encoding carbohydrate-active proteins encoded in the T. maritimagenome was followed using a targeted cDNA microarray in conjunction with mixed model statistical analysis. Coordinated regulation of genes responding to specific carbohydrates was noted. Although glucose generally repressed expression of all glycoside hydrolase genes, other sugars induced or repressed these genes to varying extents. Expression profiles of most endo-acting glycoside hydrolase genes correlated well with their reported biochemical properties, although exo-acting glycoside hydrolase genes displayed less specific expression patterns. Genes encoding selected putative ABC sugar transporters were found to respond to specific carbohydrates, and in some cases putative oligopeptide transporter genes were also found to respond to specific sugar substrates. Several genes encoding putative transcriptional regulators were expressed during growth on specific sugars, thus suggesting functional assignments. The transcriptional response ofT. maritima to specific carbohydrate growth substrates indicated that sugar backbone- and linkage-specific regulatory networks are operational in this organism during the uptake and utilization of carbohydrate substrates. Furthermore, the wide ranging collection of such networks in T. maritima suggests that this organism is capable of adapting to a variety of growth environments containing carbohydrate growth substrates.

The hyperthermophilic bacterium Thermotoga maritima MSB8 was grown on a variety of carbohydrates to determine the influence of carbon and energy source on differential gene expression. Despite the fact that T. maritima has been phylogenetically characterized as a primitive microorganism from an evolutionary perspective, results here suggest that it has versatile and discriminating mechanisms for regulating and effecting complex carbohydrate utilization. Growth of T. maritima on monosaccharides was found to be slower than growth on polysaccharides, although growth to cell densities of 10 8 to 10 9 cells/ml was observed on all carbohydrates tested. Differential expression of genes encoding carbohydrate-active proteins encoded in the T. maritima genome was followed using a targeted cDNA microarray in conjunction with mixed model statistical analysis. Coordinated regulation of genes responding to specific carbohydrates was noted. Although glucose generally repressed expression of all glycoside hydrolase genes, other sugars induced or repressed these genes to varying extents. Expression profiles of most endo-acting glycoside hydrolase genes correlated well with their reported biochemical properties, although exo-acting glycoside hydrolase genes displayed less specific expression patterns. Genes encoding selected putative ABC sugar transporters were found to respond to specific carbohydrates, and in some cases putative oligopeptide transporter genes were also found to respond to specific sugar substrates. Several genes encoding putative transcriptional regulators were expressed during growth on specific sugars, thus suggesting functional assignments. The transcriptional response of T. maritima to specific carbohydrate growth substrates indicated that sugar backbone-and linkage-specific regulatory networks are operational in this organism during the uptake and utilization of carbohydrate substrates. Furthermore, the wide ranging collection of such networks in T. maritima suggests that this organism is capable of adapting to a variety of growth environments containing carbohydrate growth substrates.
Saccharolytic microorganisms employ a range of proteins to hydrolyze, transport, and utilize complex carbohydrates that serve as carbon and energy sources (1). In some cases, these proteins are very specific to particular carbohydrates, whereas in other situations they mediate the processing of a broader range of glycosides. For simple sugars, such as glucose, binding and transport proteins alone mediate substrate entry into specific intracellular anabolic and catabolic pathways (2). However, for complex carbohydrates, a series of glycoside hydrolases must first process the polysaccharide so that its backbone and side chain glycosidic linkages are hydrolyzed to the extent needed for binding, transport, and intracellular utilization. How specific organisms develop the capacity to utilize complex carbohydrates is not known, but this probably involves evolutionary pressures in addition to acquisition of this genetic potential through horizontal gene transfer events. In any case, a microorganism's capacity to utilize carbohydrates presumably reflects the availability of such substrates in its habitat. Therefore, insights into the repertoire of carbohydrate-active proteins in a given organism and how the expression of these proteins is regulated would reveal much about particular metabolic features in addition to how it interacts within a given ecosystem.
Thermotoga maritima is an obligately anaerobic, heterotrophic, hyperthermophilic bacterium originally isolated from geothermal features associated with Vulcano Island, Italy (3). Its capacity to utilize a wide range of simple and complex carbohydrates was confirmed by the inventory of glycoside hydrolases encoded in its genome (4). In fact, the T. maritima genome, despite its relatively small size, encodes the largest number of glycoside hydrolases of any bacterial or archaeal genome sequenced to date (see Fig. 1). From growth experiments and characterization of specific glycoside hydrolases (5), T. maritima is known to metabolize both polysaccharides and simple sugars, including carboxymethylcellulose, barley glucan, starch, galactomannan (5), xylan (6), pectin, 1 mannose, xylose, and glucose (2). In some cases, the proteins involved in the processing, transport, and utilization of these glycosides can be inferred from their apparent organization into operons in the T. maritima genome sequence (4), whereas in other cases such classification is not clear. Regulation of genes encoding specific carbohydrate-active proteins in T. maritima has only been studied to a limited extent thus far (5,7), and the coordinated regulation of related genes involved in polysaccharide utilization has not been examined.
Here, a targeted cDNA microarray, based on carbohydrateactive proteins from T. maritima, was used in conjunction with mixed model analysis (8,9) to explore issues related to saccharide utilization by this organism. Despite the fact that T. maritima has been phylogenetically characterized as a primitive microorganism from an evolutionary perspective (10), results here support that it has versatile and discriminating mechanisms for regulating and effecting complex carbohydrate utilization. The relative importance of evolutionary processes and horizontal gene transfer (4) in developing its carbohydrate utilization capacity is not known, but T. maritima's ability to respond to various substrates in its growth environment underlies its ubiquity in global geothermal settings (11).

EXPERIMENTAL PROCEDURES
Construction of the Targeted cDNA Microarray-Open reading frames (total of 269) of known and putative genes related to sugar processing and other related metabolic functions were identified through BLAST (12) comparisons of protein sequences from the T. maritima MSB8 genome available on the World Wide Web at www.tigr.org/ tigrscripts/CMR2/GenomePage3.spl?databaseϭbtm. DNA primers were designed with similar annealing temperatures and minimal hairpin formation using Vector NTI 7.0 (Informax, Bethesda, MD). The selected probes were PCR-amplified in a PTC-100 Thermocycler (MJ Research, Inc., Waltham, MA) using Taq polymerase (Roche Molecular Biochemicals) and T. maritima genomic DNA, isolated as described previously (5). The integrity and concentration of the PCR products were verified on 1% agarose gels. PCR products were purified to 100 ng/l using 96-well QIAquick PCR purification kits (Qiagen, Valencia, CA), resuspended in 50% Me 2 SO, and printed onto CMT-GAPS aminosilanecoated microscope slides (Corning Glass) using a 417 Arrayer (Affymetrix, Santa Clara, CA) in the North Carolina State University Genome Research Laboratory (Raleigh, NC). Eight replicates of each gene fragment were printed onto each slide. The DNA was then attached to the slides by UV cross-linking using a GS GeneLinker UV Chamber (Bio-Rad) set at 250 mJ and baked at 75°C for 2 h.
Growth of Thermotoga maritima and RNA Isolation-Growth of T. maritima MSB8 cultures in artificial sea water was followed using optical density measurements and epifluorescence microscopic cell density enumeration, as described previously (5). Growth substrates glucose, mannose, xylose, ␤-xylan (birchwood), laminarin (Laminaria digitata), and starch (potato) were obtained from Sigma. Galactomannan (carob), glucomannan (konjac), carboxymethylcellulose, and ␤-glucan (barley) were obtained from Megazyme (Wicklow, Ireland). Growth substrates were prepared as described previously (5) and included in the medium at a final concentration of 0.25% (w/v). Substrate purities as provided by the manufacturers varied from 95 to 99%. To ensure minimum carryover between substrates, cells were grown for at least 10 passes on each carbon source using a 0.5% (v/v) starting innoculum before obtaining the growth curves. Specific growth rates on mono-and polysaccharide substrates were determined from the slopes of semilog plots of exponential cell growth versus time. Isolation of total RNA from T. maritima was performed on cells that were grown until early-to mid-exponential phase on the various growth substrates, as described in detail previously (5).
Labeling and Hybridization-First-strand cDNA was prepared from T. maritima total RNA using Stratascript (Stratagene, La Jolla, CA) and random hexamer primers (Invitrogen) by the incorporation of 5-[3aminoallyl]-2Ј-deoxyuridine-5Ј-triphosphate (Sigma) as described elsewhere (13). The slides were scanned using a Scanarray 4000 scanner (GSI Lumonics and Billerica) in the North Carolina State University Genome Research Laboratory. Signal intensity data were obtained using Quantarray (GSI Lumonics).
Experimental Design and Data Analysis-A loop design was constructed (see Fig. 2) to ensure reciprocal labeling for all 10 different experimental conditions. Replication of treatments, arrays, dyes, and cDNA spots allowed the use of analysis of variance (ANOVA) 2 models for data analysis. ANOVAs are especially appropriate for loop designs in which a large number of conditions are compared with one another, eliminating uninteresting reference samples and allowing for the collection of more information on experimental conditions (14). Mixed ANOVA models, in which some effects are considered fixed and others are considered random, have been used to re-examine published microarray data sets (9) and examine the effects of sex, genotype, and age on transcription in Drosophila melanogaster (8).
Using existing SAS procedures and customized Perl code, an automated data import system was developed to merge Quantarray intensity measurements, coordinate files generated by the array printer, and corresponding T. maritima locus numbers in a SAS data set (SAS Institute, Cary, NC). The data import system was verified through independent calculations in Excel (Microsoft, Seattle, WA). A linear normalization ANOVA model (9) of log base 2 intensities was used to estimate global variation in the form of fixed (dye, treatment) and random (array, pin within array, pin spot within array) effects and random error using the following model: The estimated effects calculated from this model were used to predict an expected intensity for each value, and then a residual was calculated as the difference between a replicate's observed and predicted intensity and then used as data to capture variation attributable to gene-specific effects after accounting for global variation. Gene-specific ANOVA models were then used to partition variation into gene-specific treatment effects, dye effects, and the same hierarchy of random effects described previously. Specifically, the model r ijklmn ϭ m ϩ D i ϩ T k ϩ A i ϩ A i (P 1 ) ϩ A i (S m P 1 ) ϩ ⑀ ijklmn was fit separately to the residuals for each gene, and the resulting parameter estimates and S.E. values were then used for statistical inference.
Volcano plots were used to visualize interesting contrasts or comparisons between two treatments or two groups of treatments (9). A Bonferroni correction was utilized to adjust for the expected increase in false positives due to multiple comparisons (9). Genes meeting the Bonferroni significance criteria were selected for further study, ensuring that genes with inconsistent fold changes would be eliminated from further analysis. Two complementary approaches were utilized to cluster data from T. maritima growth on 10 saccharides. To visualize the relative expression levels of all genes within a treatment, hierarchical clustering was performed on least squares means calculated from the linear models for each sugar (Fig. 3). To visualize the expression pattern of each single gene across treatments, the least squares mean estimates were standardized using the mean and S.D. of the 10 least squares means estimates for a given gene. Each of the 10 least squares means estimates were standardized accordingly with the formula Y i ϭ (X i Ϫ )/, where Y i ϭ the standardized least squares means variable, ϭ ⌺X i /n, and ϭ (⌺(X i Ϫ ) 2 ) 1 ⁄2 . The standardized variable was then utilized for clustering (Fig. 3). For complete information on signal intensity, significance of expression changes, -fold changes, pairwise volcano plots, and hierarchical clustering for all of the genes included on the array, see the Supplemental Material.

RESULTS AND DISCUSSION
Design of Targeted cDNA Microarray-A targeted cDNA microarray for T. maritima was constructed that included 269 known and putative genes or about 15% of the total open reading frames in the T. maritima genome. This included the known set of genes related to glycoside utilization and modification (65 genes), proteolysis (40 genes), stress response, and proteolytic fermentation. Genes related to sugar transport (21 genes) or transcriptional regulation (69 genes) and 66 other genes of interest were also included.
Genes apparently related to glycoside utilization and modification in T. maritima include 41 glycoside hydrolases, 17 glycosyl transferases, 6 carbohydrate esters, and 1 polysaccharide lyase. The corresponding encoded proteins have been classified into several families, based on amino acid sequence homology (15) (available on the World Wide Web at afmb. cnrs-mrs.fr/CAZY). There are over 130 T. maritima proteins with sufficient BLAST homology to be classified into transcriptional regulatory or signal transduction COG categories (16). These regulatory proteins have been assigned to families based on sequence homology; however, different proteins in the same families may have different DNA and substratebinding specificities (17). Also, proteins placed in different families may share the same name because of their regulon composition, as in the case of the Escherichia coli and Bacillus subtilis xylR protein (18,19). Of the 69 transcription/ transduction genes on the array, six share similarity with the   (20). Six members of the PurR/LacI superfamily (COG1609) were included (21) along with the T. maritima IclR transcriptional regulator, whose structure was recently solved (22). Several pairs of sensor histidine kinases and response regulators of putative twocomponent regulatory systems were included, as were regulators from the MarR (23), AraC (24), TroR (25), LytR (26), ArsR (27), and CspC (28) families. The T. maritima genome contains ϳ120 genes involved in oligopeptide/sugar transport. In the targeted microarray used here, 21 genes related to sugar transport were included on the basis of their proximity to the genes involved in glycoside utilization.
This targeted microarray was used to examine the differential response of T. maritima grown on a range of mono-and polysaccharides at its optimal growth temperature of 80°C. Growth conditions were analyzed based on an incomplete loop design (Fig. 2). Treatments in the loop design were balanced with respect to dyes so that treatment effects were not confounded with dye effects.
Substrate-dependent Differential Expression of Genes-Two hierarchical clusters are shown in Fig. 3 to summarize the expression patterns of 269 T. maritima genes during growth on 10 saccharides. The first cluster is based on least squares means and compares the normalized expression levels of all FIG. 4-continued genes within each treatment condition. The second cluster is based on standardized least squares means for a single gene across all 10 treatments to show the effect of different treatments on the relative expression of a particular gene. The hierarchical clustering based on standardized least squares means revealed many cases of apparent co-regulation of genes within potential operons (29). Several sets of spatially distant gene strings were observed to cluster with similar expression profiles, suggesting the presence of regulons in the T. maritima genome. Representative clusters are displayed in Fig. 4. Overall expression levels of a number of genes remained consistently high or low regardless of the growth condition. These included constitutively expressed genes like TM0017 (pyruvate ferredoxin oxidoreductase) and TM0688 (glyceraldehyde-3phosphate dehydrogenase) (30) as well as genes related to proteolytic activity. Both sets of genes with the corresponding known or putative functions are displayed in Fig. 5. Individual genes with high overall expression levels on only a single carbon source are indicated in Table II. Least squares means for all genes included in this study for all growth conditions are shown in Supplemental Table IV, along with the corresponding  standardized values in Supplemental Table V. Below, gene regulation patterns within each functional category are examined for each monosaccharide and corresponding polysaccharide growth substrate.
Glucose and Glucan Polysaccharides-Backbone-and link-age-specific gene regulation was observed in the case of endoglycoside hydrolase genes for growth on ␣and ␤-specific glucans. Growth on carboxymethylcellulose (CMC) (see cluster 4.1), a ␤-1,4-linked glucose polymer, induced genes encoding extracellular endoglucanases TM1525 (cel12B) and TM0305 (cel74), as well as the intracellular endoglucanase TM1524 (cel12A) and the intracellular cellobiosyl phosphorylase, TM1848. Examination of cluster I (Fig. 3) reveals that expression levels of cel74 were substantially lower than those of cel12A on glucan polysaccharides. Although the presence of a ␤-1,4-glucosidase gene (bglA) (accession number CAA52276) in T. maritima MSB8 has been reported (31), the corresponding protein sequence does not show homology to deduced sequences identified in the T. maritima MSB8 genome (4). Recently, bglA was reported to be present in T. maritima RQ2 (11). A switch in sugar backbone linkage from ␤-1,4 to ␤-1,3 (cluster 4.2) resulted in specific expression of genes encoding a laminarinase, TM0024 (lam16), as well as the corresponding exoglycosidase, TM0025 (cel3), with comparable overall expression levels. The ROK family protein TM0032 is located upstream of lam16 and cel3 and displays a similar expression pattern. These expression patterns suggest specific regulatory networks for each glucan that differ only by backbone linkage. The mixed linkage ␤-glucan from barley induced genes encoding ␤-1,4as well as ␤-1,3-glucanases in addition to a related glycosyl transferase, cellobiosyl phosphorylase; however, over- Starch, an ␣-glucan, induced expression of genes encoding the extracellular endo-acting ␣-amylase TM1840 (amy13A), a debranching pullulanase, TM1845 (pul13), and the intracellular exo-acting ␣-glucosidase, TM1068 (amy4A) (cluster 4.3). Expression of the ␣-glucuronidase TM0434 (32), previously classified as a putative ␣-glucosidase, also increased on starch. The cyclomaltodextrinase TM1835 was significantly induced in the presence of starch. This intracellular protein has been recently shown to have exoglycosidase activity at the reducing end in addition to maltodextrin decycling activity (33). The genes encoding a 4-␣-glucanotransferase (TM0364) and maltodextrin glycosyl transferase (TM0767) were also up-regulated during growth on starch, the latter also being expressed during growth on glucose and laminarin. Interestingly, other genes putatively assigned for starch degradation, including an ␣-amylase, TM1650 (amy13B), as well as the putative ␣-glucosi-dases, TM0752 and TM1834 (amy4AC), remained unaffected by this substrate, suggesting a different intracellular role perhaps related to utilization of storage polysaccharides like glycogen.
The relative expression of the AraC family regulator TM1005 increased specifically on starch. AraC proteins have been shown to act as transcriptional activators for a variety of substrates (24). A related MarR regulator TM0710 showed an increase in relative expression on all glucose-containing polymers, suggesting that transcriptional activators also play a role in T. maritima response to glucans. The highly expressed ROK family protein TM0393 showed its highest expression on laminarin and starch. Although expressed at a low level, a DeoR family regulator TM1069 showed its highest relative expression on starch, like its B. subtilis and E. coli homologs, which play a major role in the regulation of sugar metabolism genes (34).
All endoglycoside hydrolases were repressed during growth on glucose. Although most endoglycoside hydrolases and certain exoglycosidases exhibited biochemically specific regulation in response to backbone and sugar linkage type, several other exoglycosidases displayed unexpected patterns. For instance, growth on CMC induced expression of genes encoding a fucosidase (TM0306), a ␣-xylosidase (TM0308), three ␤-galactosidases (TM0310, TM1192, and TM1195), a ␣-galactosidase, a ␣-mannosidase (TM1851), and an ␣-arabinofuranosidase (TM0281). Some of these genes (TM0306, TM0310, and TM1851) occur in the vicinity of genes responsible for the uptake and utilization of CMC (TM0305 and TM1848) and might be expected to be co-regulated. In other cases, up-regulation of these genes presumably reflects the complex nature of polysaccharide structures present in the natural environs of T. maritima. In any case, it is apparent that annotations of some of these exo-acting glycosidase genes in the T. maritima genome need to be reevaluated.
Overall expression levels of genes encoding the glycosyl transferases TM0392, TM0744, and TM0895 remained high on glucose as well as on all ␣and ␤-glucan polysaccharides. Whereas TM0392 and TM0744 encode hypothetical proteins placed in family 4, TM0895 encodes a glycogen synthase. Growth on glucose also resulted in high overall expression levels of the gene encoding a lipopolysaccharide biosynthesis protein, TM0631. Relative expression levels of genes encoding other glycosyl transferases TM0757 (hypothetical protein), TM1405 (lipopolysaccharide biosynthesis-related), and TM0756 (galactosyltransferase) remained high on glucose as compared with the ␣and ␤-glucan polysaccharides, despite their low overall expression levels on glucose. Growth on laminarin also resulted in high overall expression levels of lipopolysaccharide synthesis genes, TM0624 and TM0627.
Growth on CMC affected expression patterns of TM0300 -TM0304 and TM1219 -TM1223, two sets of genes annotated as oligopeptide ABC transporters. The former set of genes lies directly upstream of the gene encoding the extracellular endo-glucanase TM0305 and displayed low overall expression levels compared with the latter, which is present in a gene string encoding the extracellular ␤-mannanase, TM1227. The latter set was also overexpressed during growth on barley glucan. Whereas it could be hypothesized that TM0300 -TM0304 are involved in the transport of ␤-1,4-linked glucose oligomers and TM1219 -TM1223 are involved in the transport of mixed linkage (␤-1,4/␤-1,3) glucose oligomers, the latter set remained unaffected during growth on the ␤-1,3-linked glucose polymer, laminarin.
The motif discovery software AlignACE (35) was used to examine the upstream sequences of genes co-regulated on CMC and barley glucan, revealing a strongly conserved palindromic element upstream of TM1848, TM1223, TM0299, TM0308, and TM1524 with consensus NTGaAAACATTTTCNN (see Table  III). This motif shows strong similarity to the LacI family protein CcpA consensus CRE site within the Bacillus/Clostridium group of bacteria (36). Thus, the LacI family regulator TM1218, located downstream of the TM1219 -TM1223 gene string, might be involved in local regulation of these ABC transporters and other members of the CMC degradation pathway proposed below. However, matches similar to this consensus are found upstream of genes and gene strings expressed on other sugars, raising the intriguing possibility that a global regulatory mechanism similar to CcpA-mediated carbon catabolite repression (CCR) in B. subtilis might be operating in T. maritima (37). Further work will assess this possibility.
Based on specific gene expression patterns observed during growth on CMC, the biochemical characteristics of TM1524 (Cel12A) (38), TM1525 (38), TM1848 (39), and TM0305 (40), the cellular localization of these proteins (5), and the motifs described, a mechanism for the uptake and utilization of CMC by T. maritima can be proposed (see Fig. 6A).
Mannose and Mannan Polysaccharides-Regulation patterns relating the monosaccharide mannose to mannan polysaccharides differed considerably from those observed for glucose and glucan polysaccharides. A strong correlation in TTGAAAgCGcTTTCAA a Not on array.
FIG. 6. Proposed pathways for polysaccharide uptake and utilization. A, carboxymethylcellulose uptake and utilization. TM1524, TM1525, and TM0305 encode for endo-␤-1-4-glucanases. TM1219 -TM1223 encode for ABC transporters. TM1848 encodes for a cellobiosyl phosphorylase. B, galactomannan uptake and utilization. TM1227 encodes for an endo-␤-1,4-mannanase. TM1746 -TM1750 encode for putative gene regulation was observed between cells grown on mannose and the complex polysaccharide galactomannan as well as the mixed polysaccharide glucomannan (cluster 4.4). Thus, unlike glucose, growth on mannose as well as mannan polysaccharides induced expression of the genes encoding the endoglycoside hydrolases, TM1227 (man5), TM1751 (cel5A), and TM1752 (cel5B). Whereas extracellular Man5 strictly degrades ␤-1,4 linkages between mannose residues, the latter two have been recently characterized as intracellular glucomannanases (5). The mixed polymer glucomannan induced expression of all of these genes in addition to genes encoding the endoglucanases TM0305 (cel74) and TM1524 (cel12A) and the gene encoding the cellobiosyl phosphorylase TM1848. Growth on galactomannan also induced the putative ␣-amylase, TM1650 (amy13B) and an arabinogalactan endogalactosidase, TM1201. The presence of galactose side chains in this substrate induced the expression of genes encoding the exo-acting debranching enzymes, TM1192 (gal36), TM1193 (gal42A), and TM1195 (gal42B). Whereas Gal36 hydrolyzes ␣Ϫlinked galactose residues attached to the mannan backbone, Gal42A-B cleave ␤-linked galactose end residues. These genes remained unaffected on the monosaccharide mannose as well as the mixed backbone polymer glucomannan, which lacks galactose side chain residues. The expression pattern of the LacI family protein TM1200 is similar to those of TM1191, TM1192, and TM1195, consistent with the location of TM1200 upstream of the other gene string (4). Thus, TM1200 may act as a local regulator of galactose genes in T. maritima.
The degradation of ␤-linked mannose oligosaccharides appears to be affected by the intracellular exo-acting glycosidase Man2 (TM1624), the gene that was induced on both mannan polysaccharides and on mannose. Overall expression levels of genes encoding a hypothetical glycosyl transferase (TM0392) and glycogen synthase (TM0895) were comparable with those observed for glucose and glucan polysaccharides. In contrast, the gene encoding the hypothetical glycosyl transferase TM0744 remained repressed during growth on mannose. Very high overall expression levels of pectate lyase (TM0433) were observed on glucomannan. Growth on mannose as well as on mannan polysaccharides resulted in high overall expression levels of the genes encoding the ABC transporters TM1746 -TM1750, putatively annotated as oligopeptide transporters and located upstream of genes encoding the glucomannanases, TM1751 and TM1752. Overall expression levels of genes annotated as sugar ABC transporters (TM1232-TM1234) located downstream of man5 (TM1227) remained low on all three substrates. Growth on glucomannan induced the transporter genes TM1220 -TM1223, also co-regulated during growth on CMC and barley glucan. A proposed pathway for galactomannan utilization is shown in Fig. 6B, based on the specific gene expression patterns described above, as well as the biochemical properties and cellular localization of TM1227 (41), TM1624 (42), TM1192 (42), TM1751 (5), and TM1752 (5).
Among the transcriptional regulators that showed mannosespecific up-regulation were two ROK family proteins, TM0110 and TM0411 (cluster 4.7). Mlc, a ROK family member, is known to negatively regulate three phosphotransferase system (PTS) operons, including the mannose PTS manXYZ (43). However, PTS transporters have not been observed in T. maritima. The LacI family regulator TM1856 was also observed to have its highest expression on mannose, although its overall expression levels and those of TM0411 were low when compared with TM0110. TM0510, a putative manganese-dependent transcriptional regulator, and the response regulator TM0842 had high overall expression consistently across all conditions but showed their highest relative expression levels on mannose. Another response regulator, TM0126, displayed lower overall expression levels but also showed up-regulation on mannose. Whereas the transcriptional regulators described above were largely mannose-specific, the ROK family protein TM1224 and the ribose phosphate isomerase family protein TM1228 (44) were expressed on mannose and mannan polysaccharides. Both TM1224 and TM1228 lie in the same gene string as the endoglycoside hydrolase gene, TM1227 (man5).
Xylose and ␤-Xylan-Similar to regulation patterns observed for mannose and mannan polysaccharides, growth on xylose and ␤-xylan revealed several sets of co-regulated genes (cluster 4.5). Genes encoding the extracellular endo-acting xylanases TM0061 (xyl10A) and TM0070 (xyl10B) and the intracellular exo-acting xylosidase TM0076 (xyl13) displayed comparable expression levels on both substrates and were also up-regulated on the monosaccharide mannose. Very high overall expression levels of the gene encoding the ␣-glucuronidase (45) TM0055 were observed exclusively during growth on ␤-xylan, whereas overall expression levels of the gene encoding the acetyl xylan esterase TM0077 were similar across several polysaccharides (CMC, ␤-1,3/4 glucan, starch) and monosaccharides (mannose and xylose). Relative expression levels of the family 4 ␣-glucuronidase gene TM0434 (32) were high on both xylose and ␤-xylan, whereas its overall expression levels were lower than those of other related genes. As mentioned above, TM0434 was previously classified as a putative ␣-glucosidase and showed high relative expression levels on starch. The other acetyl xylan esterase, TM0435, had high relative expression levels on xylose, ␤-xylan, and glucomannan, although its overall expression levels were low on these substrates. Other genes encoding exo-acting glycosidases that were expressed on both xylose and ␤-xylan included a putative ␣-glucosidase (TM0752) and a ␣-mannosidase (TM1231). Whereas TM0300 -TM0303 were highly expressed on xylose, TM0430 -TM0432 were expressed on ␤-xylan. The proposed pathway for the utilization of ␤-xylan is shown in Fig. 6C.
Genes up-regulated on xylan and konjac glucomannan are located in at least two main genomic regions (cluster 4.6). The region located between TM0055 and TM0077 revealed some striking compositional similarities to the glucuronic acid utilization cluster of Bacillus stearothermophilus T-6 (46), consistent with the identification of side chain substitutions in the form of acetyl, arabinosyl, ferulic acid and 4-O-methyl-␣-glucuronic acid in the ␤-xylan backbone (46). Common features include the presence of an IclR/KdgR family regulatory protein upstream of the glucuronic acid metabolism genes kdgAK and uxaBA, and divergently transcribed from uxaC, and the colocalization of extracellular and intracellular xylanases within the cluster (4,46). The second genomic region between TM0430 and TM0439 contains gene strings sharing sequence homology with genes involved in the related process of pectin degradation. 1 The ABC transporter subunits TM0430 -TM0432 are located upstream of TM0433 (pectate lyase), whereas TM0434 -TM0443 lie downstream within a gene string whose members are all separated by 20 bases or less. TM0439 is annotated as a GntR family regulator and is 35% identical and 51% similar to the gluconate repressor GntR of B. subtilis (47).
gesting involvement in the regulation of xylose and xylanrelated genes. TM0949 displays 30% identity with E. coli XylR, the positive regulator of the xylFGH ABC transporter operon, and showed a particularly strong induction on xylose. ABC transporters located downstream of TM0949 show similarity to xylFGH, and it remains to be seen whether these genes, annotated as ribose ABC transporters, are induced during growth on xylose as observed in E. coli (18,48). The CRP-like transcriptional regulator TM1171 was up-regulated primarily on xylose but to a lesser extent on CMC. Experimental evidence in Thermotoga neapolitana suggests that CCR in that species is not dependent on cAMP (49). If this is also the case in T. maritima, the function of the CRP protein may differ from that of E. coli CRP, whose role in global catabolite repression has been well documented (50). TM0808, the single ROK family protein that showed increased expression on xylan and xylose, is located upstream of a putative hydrolase (TM0809), which showed expression preferences for xylan and starch. Summary-The genome sequence of T. maritima encodes for the highest number of known glycoside hydrolase genes among hyperthermophilic bacterial and archaeal genome sequences reported to date ( Fig. 1) (51). The data set resulting from this study provides a rich resource for characterizing putative regulatory mechanisms in T. maritima that might have been proposed from genome sequence data (4). Previous studies on T. neapolitana have reported the classical lactoseglucose diauxie, operating independently of cAMP (49). In the case of ␤-galactoside transport and metabolism in T. neapolitana, transport system-based regulation through inducer exclusion or expulsion has been ruled out (52). The results presented here indicate the presence of carbon catabolite repression during growth on glucose. However, growth on this substrate was observed to be slower than on any of the glucan polysaccharides. Growth on the other monosaccharides, xylose and mannose, did not repress the genes responsible for the uptake and utilization of the corresponding polysaccharides, pointing to similar modes of regulation between these mono-and polysaccharide substrates. Studies on glucose uptake by T. neapolitana propose that transport of this monosaccharide is energized by ion gradients generated by ATP, derived from substrate level phosphorylation (2). It remains to be seen whether ABC transporters identified in this study that were expressed during growth on mannose and mannan polysaccharides follow similar mechanisms. The observation that the expression patterns of transcriptional regulators group with genomic neighbors although cDNA spots were randomized on the array provides verification that systematic biases that can obscure co-regulation have been removed from the microarray data during analysis. Results from this study suggest sugar-specific regulation patterns for members of several large protein families in T. maritima, including the ROK (20) and LacI (21) families that were not apparent from sequence data alone. Protein sequence conservation, genomic organization, and microarray expression patterns can be combined to predict local regulatory sequences for many T. maritima proteins. 3 Sequence similarity of several T. maritima LacI family proteins to the CcpA protein involved in CCR in B. subtilis coupled with the CRE-like (catabolite-responsive element) sequence motif found upstream of several sugar specific operons suggests that a type of global carbon catabolite repression may be occurring in T. maritima. 3 This mechanism is likely to differ from that operating in B. subtilis, since the T. maritima genome does not contain a strong homolog to either HPr or Crh, and no PTS systems have been identified in this organism (4,53). In Streptomyces coelicolor, which also lacks HPr, the ROK family protein glucokinase has been implicated in yet another global regulatory mechanism of CCR (37). The expression of the T. maritima CRP protein homolog observed in this microarray study raises even more questions about available mechanisms for CCR in T. maritima.
Further studies using full genome microarrays and biochemical studies to unravel the pathways for intracellular metabolism in this organism with presumably primitive traits are currently under way.