Structure and Mechanism of Rhomboid Protease*

Rhomboid protease was first discovered in Drosophila. Mutation of the fly gene interfered with growth factor signaling and produced a characteristic phenotype of a pointed head skeleton. The name rhomboid has since been widely used to describe a large family of related membrane proteins that have diverse biological functions but share a common catalytic core domain composed of six membrane-spanning segments. Most rhomboid proteases cleave membrane protein substrates near the N terminus of their transmembrane domains. How these proteases function within the confines of the membrane is not completely understood. Recent progress in crystallographic analysis of the Escherichia coli rhomboid protease GlpG in complex with inhibitors has provided new insights into the catalytic mechanism of the protease and its conformational change. Improved biochemical assays have also identified a substrate sequence motif that is specifically recognized by many rhomboid proteases.

Rhomboid protease was first discovered in Drosophila. Mutation of the fly gene interfered with growth factor signaling and produced a characteristic phenotype of a pointed head skeleton. The name rhomboid has since been widely used to describe a large family of related membrane proteins that have diverse biological functions but share a common catalytic core domain composed of six membrane-spanning segments. Most rhomboid proteases cleave membrane protein substrates near the N terminus of their transmembrane domains. How these proteases function within the confines of the membrane is not completely understood. Recent progress in crystallographic analysis of the Escherichia coli rhomboid protease GlpG in complex with inhibitors has provided new insights into the catalytic mechanism of the protease and its conformational change. Improved biochemical assays have also identified a substrate sequence motif that is specifically recognized by many rhomboid proteases.
The Drosophila growth factor Spitz is synthesized with a C-terminal transmembrane (TM) 3 domain that anchors it to the endoplasmic reticulum membrane. Cleavage of the TM domain by rhomboid protease is required for the release of Spitz into solution, which enables it to diffuse and bind to EGF receptors (EGFRs) in signal-receiving cells (1). Mutations in rhomboid and other essential genes in the EGFR signaling pathway produce a fused and pointed head skeleton in the larva, which gave rise to their names (Rhomboid, Spitz, and Star) (2). Homologs of the fly rhomboid have now been identified in most prokaryotic and eukaryotic organisms (3,4). As with other ancient enzyme families, rhomboid proteases and rhomboidlike proteins have acquired a wide range of biological functions during the course of evolution (e.g. see Refs. 1 and 5-11). The recent elucidation of the role of the protease in the infection of human cells by apicomplexan parasites Toxoplasma gondii and Plasmodium falciparum suggests that inhibition of rhomboid protease may have medical value (e.g. see Refs. [12][13][14]. Because the functions of rhomboid proteases have been extensively reviewed (15)(16)(17), we will focus this minireview on the mecha-nism of the protease, an area in which significant progress has been achieved recently. This topic may also have broader implications because rhomboid protease, site-2 protease, and ␥-secretase represent a distinct class of proteases called intramembrane-cleaving proteases (I-CLiPs) (18 -20). Different from their soluble counterparts, the I-CLiPs operate within the hydrophobic environment of the lipid bilayer and specialize in cleaving membrane protein substrates. Rhomboid protease was the first intramembrane protease whose crystal structure was solved (21) and is presently the best characterized intramembrane protease in terms of structure and catalytic mechanism.
The membrane topology and three-dimensional structure of the catalytic core domain of the Escherichia coli rhomboid protease GlpG are shown in Fig. 1. The crystal structures of the E. coli protease and a related rhomboid from Haemophilus influenzae have been studied by several groups (21)(22)(23)(24)(25). With the exception of a surface loop (L5) and one of the TM helices (S5), which we discuss below, the independently obtained structures, including one from lipid bicelles (26), are all very similar to each other. The catalytic core domain of GlpG is composed of six membrane-spanning segments (S1-S6), which harbor a number of highly conserved sequence motifs that are characteristic of the family (3). Crystallographic analyses revealed the fold of the membrane protein and showed that the HxxxN motif in S2 (His-150; Asn-154 in the E. coli protease), the GxSG motif near the N terminus of S4 (Ser-201), and the (A/G)H motif in S6 (His-254) are all essential elements of the active site of the enzyme. Ser-201 and His-254 are hydrogenbonded to each other and form a rudimentary catalytic dyad. The other conserved motifs seem to play mainly a structural role; for example, the tight packing between S4 and S6 is made possible by a conserved small amino acid (A/G) at position 206 in S4 and the GxxxG motif in S6. Besides the basic 6-TM configuration represented by E. coli GlpG, some rhomboid proteases, e.g. the mitochondrial rhomboid PARL and Drosophila Rhomboid-1, have an additional TM helix outside the core domain toward either the N terminus (1 ϩ 6) or the C terminus (6 ϩ 1) of the protein (4). Although their structures are not yet known, the 7-TM versions of the protease are expected to share the same basic catalytic mechanism.

Catalytic Mechanism and Inhibitor Binding
Based on sequence conservation and site-directed mutagenesis, it was recognized early that rhomboid proteases belong to the serine catalytic class (1). It was hypothesized initially that Ser-201, His-254, and Asn-154 (E. coli GlpG numbering) form a catalytic triad, a variant of the classical Ser-His-Asp triad (1), but later studies found that Asn-154 was not essential for enzymatic activity, suggesting that the catalytic apparatus might consist only of a Ser-His dyad (27,28). This is now confirmed by the crystal structures (Fig. 1). The sequences (GxSG) surrounding the catalytic serines of rhomboid and chymotrypsin are similar (1), but this appears to be coincidental. The structure of rhomboid's active site is very different from that of chymotryp-sin. In chymotrypsin, the backbone amide of the first glycine of the GxSG motif is pointed into the active site and, together with the amide of the serine, forms the oxyanion-binding site of the protease (29); in rhomboid, the amide group of the glycine is pointed away from the active site and does not contribute to catalysis.
Earlier biochemical studies using class-specific inhibitors led to the suggestion that rhomboid proteases might be different mechanistically from the other serine proteases; with the exception of 3,4-dichloroisocoumarin (DCI) (1,27,30), most inhibitors, including a few that have broad-spectrum activity, were found to be ineffective against rhomboid (30). Like the others, DCI reacts with the catalytic serine of the protease in a mechanism-dependent manner (31), but why it is the only compound that can broadly inhibit rhomboid could not be explained. Later studies showed that, depending on reaction conditions, even DCI does not always achieve complete inhibition, 4 and efforts to visualize the covalent adduct of DCI with GlpG by x-ray diffraction were also unsuccessful (32). We know now that the crystallographic experiment failed because the covalent complex of DCI with GlpG is unstable and that the protease can regain activity through deacylation ( Fig. 2A) (33). A critical breakthrough was made by Vinothkumar et al. (25), who finally found the right isocoumarin to work with. The new compound differs from DCI by having a methoxy substitution at position 3 and a 7-amino group; the binding of the new compound to GlpG is slightly different, so the catalytic histidine can now react with the C4 atom of the inhibitor to form an unbreakable bond ( Fig. 2A). The finding quickly led to the first crystal structure of a protease-inhibitor complex; in the complex, Ser-201 is covalently bonded to the C1 atom of the inhibitor, confirming that the serine functions as a nucleophile and directly attacks the carbonyl carbon during catalysis (Fig. 2C). The resulting acyl-enzyme is the hallmark of the classical mechanism. Shortly afterward, the crystal structure of GlpG in complex with diisopropyl fluorophosphate (DFP), a classical serine protease inhibitor (Fig. 2B), was also solved. DFP phosphorylates the catalytic serine and stably inhibits the proteolytic activity of GlpG (33). DFP performed poorly in an earlier study (30), and the reason for the discrepancy between the two studies is unclear at this time. The covalent adduct of DFP with GlpG mimics the oxyanion-containing tetrahedral intermediate of the proteolytic reaction. In the crystal structure, the phosphoryl oxygen of the inhibitor is hydrogen-bonded to the main chain NH group of Ser-201 and the side chains of His-150 and Asn-154 (the HxxxN motif in S2) (Fig. 2D), suggesting that these groups may contribute to the stabilization of the oxyanion developed during peptide hydrolysis.
The exact binding mode of peptide to the protease active site is not yet known. A model currently favored by most researchers predicts that TM substrates approach the protease from the direction of TM helices S2 and S5 (22,23,32,25,34). The cleavage site of rhomboid protease is located near the N terminus of the TM domain of the substrate, and given the known position of the oxyanion hole, this would require the catalytic serine to attack the carbonyl carbon from the si-face of the peptide bond (32,34), which is uncommon but not unprecedented (Fig. 2E) (35). The mutagenesis data showing that Ala-253 may contribute to the S1 pocket (where the side chain of the P1 residue binds) are consistent with this model (Fig. 2E, inset) (see below).
A clarification of the rhomboid protease catalytic mechanism will facilitate the development of rhomboid-specific inhibitors, which are potentially useful in medicine as adjunct therapy in treating apicomplexan infections (12)(13)(14). Because rhomboid protease uses the same chemical mechanism for catalysis as its soluble counterparts, modifying known serine protease inhibitors may represent a fruitful approach in this 4 Y. Ha, Y. Akiyama, and Y. Xue, unpublished data. endeavor. Even when high throughput screening is employed to discover novel compounds, the mechanistic knowledge will be helpful in evaluating early hits and in deciding which chemical classes are worth pursuing further. Both approaches were used in the recent discovery of the ␤-lactam class of rhomboid inhibitors (36).

Conformational Change in the Protease and Substrate
The active site of rhomboid protease is hydrophilic and has to be closed initially to minimize unfavorable contact with the lipid molecules that surround the protein from the side. The TM domain of the rhomboid substrate also initially adopts a helical conformation incompatible with cleavage by the protease. Therefore, the protease and substrate must both undergo conformational changes before their productive binding can take place. In the absence of the crystal structure of a proteasesubstrate complex, the nature of these conformational changes has been debated.
Crystallographic analyses of GlpG in complex with two classes of inhibitors suggest that the conformational change in the protease is likely subtle (25,33,37). The complex with Carboxybenzyl-Ala P (O-iPr)F (CAPF), which is the largest of these inhibitors, is illustrated in Fig. 3 (A and B). CAPF occupies the SЈ side of the protease active site; it forms a covalent bond with the catalytic serine and extends toward the gap between TM helices S2 and S5 (37). Real TM substrate is expected to pass through this gap to enter into the active site. CAPF binding displaces a loop, which we called the L5 cap (32), from the substrate-binding cleft (the opening of the L5 cap also unblocks the gap between S2 and S5) but causes only minor movement in the TM helices. The lack of any major movement, especially in the TM region of the protease, was confirmed by a co-crystallization experiment in which the conformational change in the protease was not restricted by any preformed crystal lattice (38).
The discovery that rhomboid proteases can cleave not only peptide bonds initially buried within TM regions but also hydrophilic sequences outside the TM domains (28,39,40) led to a "top-down" model in which peptide substrates bend into the protease active site from above the membrane plane ( Fig.  3D) (41). This represents a fundamental departure from the earlier hypothesis that substrate enters the protease laterally from inside the membrane bilayer. It is easy to visualize how this model may apply to cleavages in the solvent-exposed juxtamembrane region, but how about those that occur inside the TM domains? The crystal structures provide a possible clue; high resolution analysis, in which water and detergent molecules can be differentiated, revealed that the hydrophobic belt of the membrane protein (the part of the protein surface that contacts the hydrocarbon tails of the lipids) is quite thin (ϳ20 Å), suggesting that the membrane is constricted around the FIGURE 2. Catalytic mechanism. A, GlpG catalyzes the hydrolysis of DCI to form an ␣-hydroxy acid. The complex between 7-amino-4-chloro-3-methoxyisocoumarin and GlpG is stabilized by two covalent bonds. B, the covalent adduct between DFP and GlpG mimics the tetrahedral transition state. C and D, the crystal structures of GlpG in complex with isocoumarin and DFP, respectively (Protein Data Bank codes 2XOW and 3TXT) (25,33). E, hypothetical model of substrate (green) bound to rhomboid protease (side view; 90 o from that in Fig. 1B). The protease TM helices are shown as cylinders, and the loops are omitted for clarity. The extended cleavage site and helical TM segment of the substrate are connected by a sharp turn (green dots). According to this model, Ala-253 is adjacent to the side chain of the substrate P1 residue (inset). The red arrows indicate the scissile bond.
protease (41). This is supported by molecular dynamics simulations (42) and by studying the structure of the protein in lipid bicelles (where the detergent-solubilized protein is reconstituted into a local bilayer structure) (Fig. 3C) (26). The constriction of membrane around the protease may facilitate the partition (transfer) of the buried substrate cleavage site into aqueous solution or directly into the active site of the protease. Because most residues within the cleavage site, e.g. the glycines, have low hydrophobicity, such a transfer should not be too costly in terms of free energy (20,43). It is hypothesized that, once outside the membrane, the helical peptide can easily unfold into an extended conformation (its backbone can now form hydrogen bonds with water) and become susceptible to cleavage by the protease (the conformational change in the L5 loop will enable the peptide to pass through the S2-S5 gap without steric hindrance) (Fig. 3A).
The "S5 lateral gating" model offers a different explanation for the conformational change in the protease (22,44,45). The model was based on an earlier crystal structure of the apoprotease in which TM helix S5 is tilted drastically away from the other helices (22). This movement was thought to open a gate inside the membrane for substrate to enter the protease laterally. Mutations designed to weaken the interactions between S5 and S2 have been found to enhance the protease activity (44,46). The large tilting movement of the S5 helix is not observed, however, in any of the protease-inhibitor complex structures solved later. Furthermore, a recent study showed that cross-linking S5 to S2 in the F153C/W236C double mutant (Figs. 2E and 3A) does not hinder the ability of the protease to cleave a TM substrate in both detergent solution and reconstituted membrane vesicles (38). Because the cross-linker physically blocks the path between the two helices, the new experiment demonstrated that the TM substrate is fully capable of climbing over residues 153 and 236 (Fig. 2E), thus avoiding the proposed lateral gate, to reach the active site.  (21,37). CAPF is shown as space-filling models. TM helices S2 and S5 are colored in dark blue, and the L5 cap is highlighted in yellow. B, the movement of the TM helices is small (top view). The C␣ traces of the apoprotein (gray) and the CAPF complex (brown) are shown. The gray arrow indicates the direction of the tilt of S5 according to the lateral gating model, and the black arrow indicates the movement of S5 in the CAPF complex. C, structure of the GlpG S201T mutant in a lipid environment (Protein Data Bank code 2XTV) (26). The protein surface is color-coded according to the electrostatic potential. Red, negative; blue, positive. The lipid molecules are shown as space-filling models. Yellow, carbon; red, oxygen. The estimate of the membrane lower boundary is higher than that in Ref. 26 to exclude the polar (lipid) oxygen atoms from the hydrophobic core. D, schematic diagram illustrating that buried (red box, helical) and exposed (red zigzag line, extended) cleavage sites use a similar mechanism to enter the active site of the protease (shown as a cross-section). The gray lines indicate the boundaries of the membrane. Because it is not yet possible to predict precisely where the TM helices end, we do not know for certain how deep the scissile bonds are buried.

Substrate Specificity
Rhomboid proteases are unique among I-CLiPs, as substrate cleavage does not depend on prior ectodomain shedding of the substrate, which appears to be critical for determining substrate specificity in other I-CLiP families. Nonetheless, rhomboids specifically recognize their substrates; Drosophila Rhomboid-1 cleaves membrane-bound EGFR ligand precursors (e.g. Spitz, Keren, and Gurken) but not similar membrane proteins (e.g. TGF␣, Delta, and TGN38) (47). Drosophila Rhomboid-1 and AarA (a rhomboid homolog from the Gram-negative bacteria Providencia stuartii) act interchangeably in these organisms, and AarA and other bacterial rhomboids specifically cleave EGFR ligand precursors, suggesting that rhomboids recognize their substrates via a common and specific mechanism (48). Understanding the principal determination of substrate specificity should promote identification of new rhomboid substrates and elucidation of the mechanism.
Chimeric analysis of Spitz and TGF␣/Delta indicated that a small luminal region of the Spitz TM segment is necessary and sufficient for cleavage by Rhomboid-1 (49). Mutational analysis of this region showed that the most important determinants are helix-destabilizing residues. Based on sequence similarity to this Spitz substrate motif, T. gondii micronemal adhesin was identified as a rhomboid substrate. Studies with model substrates have provided information on substrate specificity for the E. coli rhomboid homolog GlpG; in addition to helix-destabilizing residues around the cleavage region, those in the substrate TM region are critical for cleavage (50). Although it is thought that I-CLiPs cleave their substrates within the plane of the membrane, AarA and GlpG could cleave a substrate at the correct site when the cleavage site was moved into the juxtamembrane or membrane/extracytoplasm interface region by insertion of a hydrophilic linker sequence (39,51), suggesting that they can cleave a substrate region exposed to the hydrophilic milieu. Helix-destabilizing residues in the substrate TM region may cause local unfolding, facilitating exposure of the membrane-embedded cleavage site via membrane thinning around the enzyme (41) and/or its presentation to the proteolytic active site. Random mutagenesis of residues on each side of the scissile bond (P1 and P1Ј residues), combined with in vivo screening, showed that residues with a small side chain and with a small or negatively charged side chain are preferred at the P1Ј and P1 sites, respectively, for proteolysis by GlpG (50). However, these substrate features were insufficient to predict new rhomboid substrates.
A more comprehensive mutational analysis of the region surrounding the cleavage site of TatA, a physiological substrate of AarA, identified a specific sequence motif commonly recognized by rhomboids (Fig. 4) (51). Freeman and co-workers (51) individually mutated 7 residues (from positions P5 to P2Ј) to 1 of 19 other amino acids and examined cleavage of mutated substrates by AarA. Residues at the P4, P1, and P2Ј positions were the most sensitive to substitutions for in vitro and in vivo cleavage. P1 required small residues (Gly, Ala, Ser, and Cys), whereas P4 and P2Ј required bulky hydrophobic residues (Val, Leu, Ile, Phe, and Trp), although small residues (Ala, Ser, Cys, and Thr) at P2Ј also permitted cleavage. This motif was also recognized by other prokaryotic and eukaryotic rhomboid proteases. Moreover, a similar motif was found in other substrates, including Gurken and Spitz, and was cleavable by AarA. For cleavage of a substrate with an exposed cleavage site, there was a stricter requirement for this sequence motif around the cleavage site than for helix-destabilizing residues in the TM region.
Analysis of the complex formed between GlpG and an isocoumarin inhibitor suggested the presence of pockets (S1-and S1Ј-binding subsites) that accommodate the side chains of P1 and P1Ј residues (25). Mutations reducing the cavity size of the putative S1 pocket compromised cleavage of WT TatA (Ala at P1) but exerted weaker effects on cleavage of a mutant with a smaller residue at this position, TatA A8G, providing a structural explanation for preference for the P1 position. Screening with an algorithm based on the specificity matrix enabled identification of AarA substrates; among the 15 top-scored candidates, 38% were cleavable by AarA (51).
Although the proposed motif would provide a basis for substrate recognition by rhomboids in many cases, not all of the rhomboid substrates have this motif, and it is likely that other mechanisms for substrate recognition/cleavage may exist. For example, cleavage of thrombomodulin by Rhomboid-2 depends on its cytoplasmic rather than TM domain (52). Mitochondrial rhomboids may also recognize different substrate features (53). Although rhomboids have substrate preference for type I (N out -C in ) single-spanning membrane proteins, type II (N in -C out ) single-spanning membrane proteins (54,55) and even multi-spanning membrane proteins can act as rhomboid substrates (56,57). Thus, further studies are required to fully understand rhomboid substrate specificity.

Future Prospects
Since the landmark discovery that Drosophila Rhomboid-1 represents a new class of membrane-bound proteases (1), the field has expanded tremendously. The biological functions of many related rhomboid proteins are now known, and there is optimism that the pace of such discoveries will only quicken in the near future. The crystal structures of E. coli and H. influenzae GlpG proteins have provided a framework for in-depth FIGURE 4. Determinants for rhomboid substrate specificity. P1 and P4 are the first and fourth residues on the N-terminal side of the scissile bond, respectively, and P2Ј is the second residue on the C-terminal side. The S1 and S2Ј subsites of rhomboid recognize the P1 and P2Ј residues of the substrate. HB, helix-breaker residue.
probing of the mechanism of action of the membrane protein.
On this second front, future work needs to focus on the following areas. (i) The crystal structure of rhomboid protease in complex with a peptide substrate analog covering both S and SЈ subsites has to be solved to explain the basis of the observed substrate specificity. (ii) The structure of the protease with a bound TM substrate is required to fully explain the nature of the conformational change in the enzyme. (iii) Complementing the structural characterizations, biophysical studies should be carried out to examine how rhomboid protease interacts with the lipid bilayer and how such interactions may influence the protease activity. (iv) The biochemical mechanism of the mitochondrial rhomboid PARL should be investigated more thoroughly (55,58). PARL has an inverted catalytic core domain and, unlike the others, cleaves in the middle of a hydrophobic sequence downstream of the primary TM domain of the substrate. (v) T. gondii and P. falciparum rhomboids are also interesting subjects for future research because they play essential roles in the life cycles of two medically important parasites and demonstrate unique substrate specificities (14,59).