The Shwachman-Bodian-Diamond Syndrome Protein Family Is Involved in RNA Metabolism*

A combination of structural, biochemical, and genetic studies in model organisms was used to infer a cellular role for the human protein (SBDS) responsible for Shwachman-Bodian-Diamond syndrome. The crystal structure of the SBDS homologue in Archaeoglobus fulgidus, AF0491, revealed a three domain protein. The N-terminal domain, which harbors the majority of disease-linked mutations, has a novel three-dimensional fold. The central domain has the common winged helix-turn-helix motif, and the C-terminal domain shares structural homology with known RNA-binding domains. Proteomic analysis of the SBDS sequence homologue in Saccharomyces cerevisiae, YLR022C, revealed an association with over 20 proteins involved in ribosome biosynthesis. NMR structural genomics revealed another yeast protein, YHR087W, to be a structural homologue of the AF0491 N-terminal domain. Sequence analysis confirmed them as distant sequence homologues, therefore related by divergent evolution. Synthetic genetic array analysis of YHR087W revealed genetic interactions with proteins involved in RNA and rRNA processing including Mdm20/Nat3, Nsr1, and Npl3. Our observations, taken together with previous reports, support the conclusion that SBDS and its homologues play a role in RNA metabolism.

The predominant genetic event that underlies SBD syndrome appears to be a gene conversion event between the SBDS locus and a pseudogene that is 97% similar in sequence to SBDS, but is predicted to encode a non-functional protein (1,5). Sequence analysis of disease-associated alleles has identified more than 20 different mutations in affected individuals (1,5,6). Two mutations, both predicted to lead to truncated gene products, account for over 95% of these mutant alleles. The first, and most common mutation, 258 ϩ 2T 3 C (84Cfs3), can be found in both homozygous and heterozygous affected individuals and is predicted to cause partial loss-of-function. The second mutation, 183-184TA 3 CT (K62X), is inevitably found together with other mutations and is never homozygous, suggesting that homozygosity for this allele is lethal. The remaining 5% of alleles are point mutations that map mostly to the 5Ј-end of the coding sequence.
Although the molecular function of SBDS is unknown, the predicted SBDS protein is evolutionarily conserved and has apparent homologues in Archaea, plants, yeast, and other lower eukaryotes, suggesting that it may have a fundamental, conserved biochemical role. Several lines of circumstantial evidence suggest SBDS has a role in RNA metabolism. First, in Archaea, the gene is located in an operon that encodes, among other proteins, RNA-processing enzymes (7). Second, in yeast, the gene for the SBDS homologue (YLR022C) clustered with RNA-processing enzymes in transcription profiling experiments (8). Third, although the coding sequence is divided into three blocks of sequence conservation in most organisms, including humans, several plant SBDS homologues contain a fourth C-terminal region that is predicted to be an RNA binding domain (1).
In an effort to determine the molecular and cellular functions of SBDS, we initiated a series of studies in model organisms. As a part of our structural genomics project, we used x-ray crystallography to determine a high resolution structure of the Archaeal SBDS homologue, AF0491 in Archaeoglobus fulgidus, and NMR to discover an unanticipated structural homologue of the N-terminal domain of SBDS in Saccharomyces cerevisiae (YHR087W). The biochemical and genetic links to RNA metabolism for both the S. cerevisiae SBDS sequence homologue, YLR022C, and the more distantly related homologue, YHR087W, suggest a common functional role for this new structural family.

EXPERIMENTAL PROCEDURES
Cloning, Purification, and Crystallization of AF0491-The AF0491 gene was subcloned, expressed, and its product purified and screened for crystallization as described previously (9). Crystals for x-ray diffraction data collection were obtained from hanging drop vapor diffusion conditions containing 2 l of Se-Met derivative of the protein plus 2 l of 0.2 M sodium acetate, 0.1 M sodium cacodylate at pH 6.4, and 20% polyethylene glycol 3350 over 2-5 days at 21°C. The crystals were flash-frozen with crystallization buffer completed with 17% glycerol.
X-ray Diffraction and Structure Determination-The protein was crystallized in the P1 space group with the unit cell parameters of a ϭ 33.848, b ϭ 44.189, c ϭ 61.295, ␣ ϭ 92.575°, ␤ ϭ 117.995°, ␥ ϭ 109.771°. The crystals contained one independent molecule per unit cell. A 2.0-Å three-wavelength MAD diffraction experiment from selenomethioninecontaining crystal was collected and then processed with the HKL2000 package (10). Five selenium sites (of a total of six) were found by direct methods using the SnB program (11). The experimental phases were calculated and solvent flattened by SHARP (12). Of a total of 234 amino acids, 197 were autotraced and partially docked in the sequence by ARP/wARP (13). The remaining amino acids were manually built into the experimental map using the program O (14). The structure was refined, and the water substructure generated in CNS (15). The final model contains all amino acids in an interval of 2-234 (except for a disordered loop comprising residues 115-117) and 145 water molecules. PROCHECK (16) indicated that all the parameters are within, or better than, estimated limits for 2.0-Å resolution structures in the Protein Data Bank. A Ramachandran plot shows 92.6% of all residues are in the most favored regions, 6.4% are in favored regions, one residue is in the generously allowed, and one (0.3%) is in a disallowed region. The final statistics are presented in Table I.
Cloning and Purification of YHR087W-The YHR087W gene was PCR-amplified from genomic DNA and inserted into a pET15b (Novagen) vector. This construct yields the protein with an N-terminal His 6 tag and thrombin cut site. The protein expression and purification method are described by Yee et al. (17). The His 6 tag was not cleaved, leaving an extra 20 residues at the N terminus (MGSSHHHHHHSS-GLVPRGSH). NMR samples of ϳ1.5 mM uniformly 15 N/ 13 C-labeled protein were prepared in 10 mM NaOAc, 300 mM NaCl, 10 mM dithiothreitol, 10 M Zn 2ϩ , 1 mM benzamidine, 1ϫ inhibitor mixture (Roche Applied Science), and 0.01% (w/v) NaN 3 in 10% (v/v) 2 H 2 O/H 2 O at pH 5.0. This buffer was not optimized to determine the importance of any components. Samples were placed in 5-mm Shigemi susceptibility matched NMR tubes.
Calculation and Analysis of YHR087W Structural Ensemble-NOE distance restraints had uniform lower bounds of 1.8 Å and upper bounds of 2.8, 3.2, 4.0, or 5.0 Å. Hydrogen bond restraints were derived from amide proton D 2 O exchange data. Amide 1 H-15 N HSQC crosspeaks still present 30 min after dissolution of a lyophilized sample in D 2 O were given bounds of 1.8 -2.5 Å for the HN-O distance and 2.8 -3.5 Å for the N-O distance, provided preliminary structural ensembles clearly indicated the correct acceptor atom. Dihedral restraints for phi were derived from the HNHA experiment (20) and had bounds of Ϫ55 Ϯ 30 degrees for helical residues with J Ͻ 5 Hz and Ϫ120 Ϯ 50 degrees for extended residues with J Ͼ 7.5 Hz. Dihedral restraints for psi had values of Ϫ47 Ϯ 30 for helical residues and 140 Ϯ 50 for extended residues. Psi restraints were added only for residues in helices and ␤-sheets and only after consideration of amide-␣ NOE ratios (HN i -H␣ i versus HN i -H␣ i-1 ), ␣-carbon chemical shifts, and the evident secondary structure propensities in preliminary ensembles of structures.
Structures were calculated with NIH-Xplor (21) using distance geometry and simulated annealing. The routines dg_sub_embed, dg_ full_embed, and dgsa were used as provided except that in dgsa, an initial temperature of 2,000 K was used with 30,000 high temperature steps and 200,000 cooling steps. Sum averaging was used for methyl groups and methylene proton pairs. The structural ensemble was analyzed with PROCHECK-NMR (22). Surface electrostatic features of the protein were examined using GRASP (23). Structure similarity searches using Dali (24) and VAST (25) were conducted using the first structure from the ensemble as a representative structure for similarity searching. Structural statistics are presented in Table II.
Sequence Analysis of the Conserved N-terminal Region of the SBDS Family-The alignment of the orthologues of human SBDS was used to search for distant protein families via intermediate searches (26) using global hidden Markov model profiles (using hmmsearch of HMMer; hmmer.wustl.edu/) (27). To improve the profile quality we followed two approaches: first, BLAST searches against unfinished genomes (28), and secondly, additional searches against EST databases (29), using NAIL to view and analyze the HMMer results (30). This sequence enrichment improved the quality of the profile that was used to perform the searches against the non-redundant protein databases. The alignment in Fig. 3 was produced with HMMer (27) and Belvu (www.sanger. ac.uk/Software/Pfam/help/belvu_setup.shtml), and the phylogenetic tree was produced using ClustalW (31).
Purification of TAP-tagged Proteins and Mass Spectrometry-TAPtagged (tandem affinity purification) (32) proteins were purified from extracts of yeast cells as previously described (33) on IgG and calmodulin columns. The purified proteins were separated by SDS-PAGE on gels containing 10% polyacrylamide, and the proteins were visualized by silver staining. Protein bands were digested with trypsin and peptide samples were spotted onto a target plate with a matrix of ␣-cyano-4hydroxycinnamic acid (Fluka). MALDI-TOF mass spectrometry analysis was conducted utilizing a Reflex IV instrument (Bruker Daltonics, Billerica, MA) in positive ion reflectron mode. Tandem mass spectrometry on a Finnigan LCQ-Deca instrument was carried out as previously described (33) on an aliquot of each preparation that was digested with trypsin in solution after elution of the tagged protein complex from the calmodulin column.
Synthetic Genetic Array Analysis (SGA)-SGA analysis was carried out as previously described (34), except with a miniarray of 383 deletion strains.

RESULTS
Crystal Structure of AF0491, the Archaeal Homologue of SBDS-The SBDS protein is conserved in Archaea and Eukarya. A 2.0-Å structure of AF0491, the SBDS homologue from A. fulgidus, was solved from a single selenomethionine-containing crystal. The structure was refined to an R factor /R free of 21.9/27.0%. The data collection and refinement details are provided under "Experimental Procedures" and in Table I. AF0491 comprises three domains (Fig. 1). The AF0491 Nterminal domain (residues 1-86) is an ␣␤ domain that adopts a novel fold. It consists of a five-stranded antiparallel ␤-sheet in which strands ␤2 and ␤3, as well as ␤4 and ␤5, are connected by pairs of helices. The secondary structural elements are thus arranged as follows: ␤1-␤2-␣1-␣2-␤3-loop-␤4-␣3-␣4-␤5 (Fig.  2B). On strand ␤3 on the exposed side of the ␤-sheet, there is an irregular two residue ␤-bulge, in which the hydrogen bonding pattern of the antiparallel strands is disrupted, and the normal backbone geometry is distorted to accommodate the two additional residues. There is a loop at one end of the ␤-sheet that bends in the direction of the bulge, forming a cupped incursion in the surface. It appears that this fold is new, as no similar structures were identified with Dali or VAST. Structure-based sequence alignments showed that this domain corresponds to residues 10 -95 in human SBDS.
The middle domain of AF0491 (residues 87-160) is a winged helix-turn-helix, a common fold associated with DNA binding (35). In the context of the human SBDS, this domain corresponds to residues 98 -169. The C-terminal domain of AF0491 (residues 161-234), which corresponds to residues 170 -241 in the human protein, comprises a four-stranded ␤-sheet that is buttressed on one side by two helices. This is also a common fold, sharing structural homology with the RNA recognition motif (RRM).
More than 20 different mutations have been identified in SBD syndrome patients (1, 5, 6) (Fig. 1A). Most of the mutations that alter surface residues are located in the N-terminal half of the protein, where many of the conserved residues are also located. An analysis of disease-linked mutations can be found in the report by Shammas et al. (6).
YHR087W and the AF0491 N-terminal Domain-As part of an ongoing structural genomics project in yeast, we used NMR spectroscopy to determine the structure of YHR087W from S. cerevisiae ( Fig. 2A). This protein did not have a previously known structure or function; however, the YHR087W orthologue in S. pombe is expressed during sporulation and environmental stress (36,37). The structure of YHR087W displayed striking similarity to that of the N-terminal domain of AF0491 (Fig. 2B), a similarity that was first recognized by Alexey Murzin. 2 This was confirmed using pairwise comparisons: Dali returned an alignment of 79 residues with an r.m.s.d. of 2.1 Å for the CA atoms, whereas VAST reported 68 aligned residues with an r.m.s.d. for CA atoms of 2.3, score 6.5, p value of 0.002 (Fig. 1A). Consequently, we searched for sequence similarity between the two proteins using HMMer. Profiles of the Nterminal conserved region of the SBDS family scored against the YHR087W sequence with an E-value of 0.042. Reciprocally, the profile of the YHR087W sequence with its orthologues detected the SBDS family with an E-value of 0.018. Statistically significant E-values connected all the sequences contained in both families. None of these HMMer profile searches retrieved any new unrelated sequences and, as stated above, reciprocal searches produced convergent results. Thus, YHR087W is a distant sequence homologue of the AF0491 N-terminal domain and, by extension, of both the human SBDS sequence and of YLR022C, an SBDS sequence homologue that we identified in S. cerevisiae (Figs. 1A and 3). The runs of hydrophobic residues are in good agreement between both alignments, and there are also several conserved residues. The grouping by taxonomy is in agreement with the structure of the phylogenetic tree derived from the multiple sequence alignment, with all sequences in the branch of YHR087W being found exclusively in fungi (with the exception of the Zea mais 2 A. Murzin, personal communication. sequence, which could be either a true case of horizontal transfer or the result of a fungal contamination of the corresponding EST library). The structural similarity between YHR087W and the AF0491 N-terminal domain (AF0491-N) extends over the entire structure, including conservation of the irregular ␤-bulge in ␤3 and the loop at one end of the ␤-sheet. Conservation of these structural details, particularly within such an unusual fold, likely indicates a shared function. There is a 9-residue loop inserted between ␤3 and ␤4 of YHR087W; this loop contains 3 residues in AF0491-N (Fig. 1A). In the NMR structure, this loop is structurally underdetermined because of a lack of resonance assignments and NOE data, most likely because of intermediate time scale conformational exchange.
The electrostatic properties of YHR087W and AF0491-N are also similar (Fig. 4). One side of both proteins, formed by the side of helix ␣2, the irregular bulge region in strand ␤3 and the edge of ␤4 as well as by the N-terminal end of helix ␣4 and the loop preceding it, is particularly rich in acidic residues and devoid of basic residues, whereas the other sides of the structures have a mixture of acidic, basic, and uncharged side chains. The conservation of electrostatic surface potential is explained by the presence of a number of conserved or identical residues. Among the conserved residues are two glutamates, FIG. 1. Structure of AF0491 from A. fulgidus. A, sequence alignment of structural (YHR087W) and sequence homologues of human SBDS. SBDS mutations are indicated above the sequences with the most common mutations in red. Structurally aligned YHR087W residues are in uppercase letters and unaligned residues are lowercase. YHR087W residues in red or green are structurally aligned residues that are identical or conserved, respectively. B, stereo image of AF0491, showing its three domain structure. which contribute to the acidic side of the structure, although additional acidic residues, which are not conserved, also contribute to this feature in both structures. Notably, there is a conserved basic region (Lys 31 /Lys 32 : YHR087W/AF0491-N), which, in AF0491, is located in the large cleft formed by the three domains. Also present are several conserved hydrophobic residues, probably involved in stabilization of the fold.
Functional Analysis of YLR022C and YHR087W, SBDS Homologues in S. cerevisiae-The biochemical and cellular functions of both YLR022C and YHR087W are unknown. Strains lacking the YLR022C gene are not viable, indicating it is an essential gene, whereas strains lacking YHR087W are viable (38). Thus, in an effort to infer the functions of YLR022C and YHR087W, and by inference those of SBDS, we turned to the genomic and proteomic analyses that have been used with such success in yeast.
YLR022C was tagged at the C terminus with a TAP tag by recombining a tagging cassette directly into the chromosomal YLR022C locus. The protein was then purified by tandem affinity chromatography (Fig. 5) and associated proteins were The prefix identifies the database from which the sequence originates: sw, Swiss-Prot; sp, spTrEMBL; est, consensus sequences manually reconstructed by assembling highly similar expressed sequence tags from a given species (conceptual translations) obtained from the EST database at NCBI (28); gb, genome BLAST server at NCBI (28). Branch coloring indicates the taxonomy of the sequence: blue, yeasts; green, plants; red, other eukaryotes; gray, Archaea. The color scheme of the alignment indicates average BLOSUM62 score (correlated to amino acid conservation) in each alignment column: cyan, greater than 3.5; light red, between 3.5 and 1.5; light green, between 1.5 and 0.8. identified using LC/MS/MS mass spectrometry. YLR022C copurified with small amounts of a large number of polypeptides, most of which are linked to the processing of rRNA and many of which are known components of the 60 S particle. Because ribosomal proteins are abundant, there is a concern that they might be adventitiously associated with YLR022C. However, we believe that this is unlikely because we have now purified more than 3,000 yeast proteins, and only a limited number have co-purified with the 60 S particle. Therefore, these data support a role for YLR022C in ribosomal processing.
To explore whether the sequence and structural similarity between YHR087W and YLR022C corresponded to a functional similarity, we performed genetic and protein interaction studies with YHR087W. When YHR087W was tagged with a TAP tag and purified using tandem affinity chromatography, we did not detect any co-purifying proteins (data not shown). However, since YHR087W is a non-essential gene, we were able to explore genetic interactions between YHR087W and other yeast genes using SGA analysis; results were confirmed by tetrad analysis. A strain deleted for YHR087W was crossed with a miniarray of 383 other deletion strains, each lacking a protein implicated in RNA metabolism. Deletion of YHR087W caused marked synthetic lethality when combined with deletions of several other genes including those that encode the NatB complex (Mdm20/Nat3), which is required for the acetylation of ribosomal proteins (39), and Nsr1, a nucleolar protein involved in the synthesis of 18 S rRNA and its 20 S precursor (40,41). Nsr1 has two RNA recognition domains and is a member of the GAR (glycine/arginine-rich repeats) family of proteins (42). YHR087W also interacted genetically with each of Npl3, Air2, and Yra2, which others have shown to interact with each other both by GST pull-down and two hybrid-based methods (43,44). Npl3 is a protein involved in 18 S and 25 S rRNA processing, export of RNA from the nucleus, and import of proteins into the nucleus (45)(46)(47)(48). Npl3 is also associated with the U1 snRNP and is predicted to have two RNA recognition domains (46,49). Air2 is a RING-type zinc finger domain protein, and Yra2 is a protein that associates with RNP complexes (43,44). Therefore, the genetic interactions of YHR087W support a role for the protein in RNA processing. DISCUSSION Shwachman-Bodian-Diamond syndrome is the second most common cause of pancreatic insufficiency in children, after cystic fibrosis (50). The syndrome is caused by mutations in the SBDS gene on chromosome 7 (1). Our studies of SBDS homologues in Archaea and yeast provide experimental evidence for a role for SBDS in RNA metabolism.
We have solved the crystal structure of AF0491, the SBDS homologue in A. fulgidus, revealing a three domain protein.
The C-terminal domain comprises a four-stranded ␤-sheet with two helices on one side, a commonly occurring fold. Although the domain shares structural homology with the RRM, this is a common fold making it difficult to infer a function. RRMs are found in many RNA-binding proteins, including translation factors, poly(A)-binding proteins, and proteins associated with pre-mRNA and pre-rRNA processing (51). This domain has also been found in DNA-binding proteins and can be involved in protein-protein interactions (52)(53)(54).
The central domain of the protein adopts another common fold, the winged helix-turn-helix (wHTH). Proteins with an HTH domain are abundant in all Archaea, with the wHTH fold being the most common (55), and so it is difficult to infer a molecular function. Although HTH domains are widely used in DNA binding and have also been identified in RNA-binding proteins (56 -58), a role in nucleic acid binding is not supported since the surface of AF0491 does not have the general basic character expected for such a function. Part of this domain may be involved in protein-protein interactions, as with Kluyveromyces lactis HSF, another wHTH protein. Unlike other wHTH proteins where the wings contact DNA, the wing in K. lactis HSF is involved in HSF dimerization (59).
The N-terminal domain of AF0491 is a novel ␣␤-fold composed of a five-stranded antiparallel ␤-sheet, with pairs of helices between strands ␤2 and ␤3, and strands ␤4 and ␤5. Most of the disease-linked mutations identified in SBD syndrome patients are located in this unusual domain, and genetic evidence suggests that, in humans, the presence of the Nterminal domain is required for viability. Interestingly, this new fold was found in another protein, YHR087W, whose structure was determined by NMR spectroscopy. Subsequent sequence similarity analysis demonstrated that the two proteins are indeed distant homologues. Structural comparisons revealed the striking conservation of details such as the unusual irregular ␤-bulge in strand ␤3, and electrostatic surface char- acteristics. These findings prompted us to search for a shared function.
To elucidate the function of SBDS, we made use of the various experimental advantages of the yeast S. cerevisiae, and studied the SBDS structural and sequence homologues, YHR087W and YLR022C respectively. Our results link both YLR022C and YHR087W to RNA metabolism.
TAP-tagged YLR022C co-purified with numerous ribosomal proteins and proteins associated with rRNA processing. Protein complexes that are purified using the TAP protocol can also be probed for the presence of specific RNAs through hybridization to DNA microarrays. This process is particularly applicable to RNA processing enzymes, which are often ribonucleoprotein assemblies. Previously, the affinity-purified YLR022C complex was probed for potential co-purifying RNAs by using a dedicated DNA microarray (60). YLR022C was found to co-purify with snoRNAs and exhibited a profile that is similar to that of YHR040W (BCD1), an RNA-associating protein. snoRNPs, which contain these non-coding snoRNAs, have been implicated in the cleavage, modification, and folding of precursor rRNA substrates (61).
Our studies, which provide experimental evidence for a role for SBDS in RNA metabolism, support previous hypotheses which were based on bioinformatics studies. In Archaea, the orthologous gene is located in an operon that includes RNAprocessing enzymes (7). Through a computational study, YLR022C was predicted to function in rRNA processing (62). This was accomplished by analyzing protein-protein interaction networks and identifying biologically relevant functional groups. Also, SBDS RNA is expressed ubiquitously (1), which is consistent with a basic cellular function such as RNA processing. Finally, in some plants, the SBDS orthologues have a fourth domain that contains a putative RNA-binding motif (1).
Although our data link the SBDS protein to ribosomal biogenesis, the specific role of SBDS in this pathway remains to be determined. Whereas the function of the ribosome is conserved in all living species, ribosomal biogenesis is fundamentally different in Bacteria compared with Archaea and Eukarya; many families of ribosomal genes are specific to Archaea and Eukarya and are absent from Bacteria (63). The observation that the SBDS gene is restricted to Archaea and Eukarya (1) suggests that SBDS plays a role that is specific for the process in these two kingdoms.
Bacterial ribosomes, though sharing many proteins with those in eukaryotes and Archaea, are assembled differently (64 -67). Active prokaryotic ribosomes can be reconstituted in vitro using only the individual ribosomal components. Although this does not exclude the involvement of additional factors in vivo, it is evident that all the information needed to assemble an active prokaryotic ribosome is contained within the sequences of the ribosomal proteins and rRNAs. By contrast, eukaryotic, and likely archaeal, ribosome biosynthesis requires the coordinated action of hundreds of accessory proteins, snoRNPs, and ribosomal proteins to produce the final assembly of over 70 ribosomal proteins with four rRNAs. In this pathway, three of these rRNAs (18 S, 25 S, and 5.8 S) are produced from a single 35 S transcript whereas the fourth 5 S rRNA is independently transcribed. Processing of the pre-rRNA occurs following association of ribosomal and non-ribosomal proteins with the pre-rRNAs, forming a pre-ribosomal particle. Pre-rRNA modifications needed to produce the mature rRNA include cleavage of the pre-rRNAs, conversion of uridine residues to pseudouridine, and nucleotide methylation. This pre-ribosomal particle then separates into 40 S and 60 S presubunits. As they exit the nucleolus and then the nucleus, additional factors associate with, and dissociate from, the pre-subunits until the final maturation of the ribosome occurs in the cytoplasm.
Numerous trans-acting protein factors with a role in ribosome biogenesis have been identified in yeast, including rRNAmodifying enzymes, nucleases and putative RNA helicases, and it is likely that others have not yet been discovered (65). SBDS may be one such factor, necessary for eukaryotic, but not prokaryotic, ribosomal biosynthesis.