The Crystal Structure of the Periplasmic Domain of the Escherichia coli Membrane Protein Insertase YidC Contains a Substrate Binding Cleft*

In bacteria the biogenesis of inner membrane proteins requires targeting and insertion factors such as the signal recognition particle and the Sec translocon. YidC is an essential membrane protein involved in the insertion of inner membrane proteins together with the Sec translocon, but also as a separate entity. YidC of Escherichia coli is a member of the conserved YidC (in bacteria)/Oxa1 (in mitochondria)/Alb3 (in chloroplasts) protein family and contains six transmembrane segments and a large periplasmic domain (P1). We determined the crystal structure of the periplasmic domain of YidC from E. coli (P1D) at 1.8 Å resolution. The structure of P1D shows the conserved β-supersandwich fold of carbohydrate-binding proteins and an α-helical linker region at the C terminus that packs against the β-supersandwich by a highly conserved interface. P1D exhibits an elongated cleft of similar architecture as found in the structural homologs. However, the electrostatic properties and molecular details of the cleft make it unlikely to interact with carbohydrate substrates. The cleft in P1D is occupied by a polyethylene glycol molecule suggesting an elongated peptide or acyl chain as a natural ligand. The region of P1D previously reported to interact with SecF maps to a surface area in the vicinity of the cleft. The conserved C-terminal region of the P1 domain was reported to be essential for the membrane insertase function of YidC. The analysis of this region suggests a role in membrane interaction and/or in the regulation of YidC interaction with binding partners.

Membrane proteins represent more than one-third of the gene-encoded proteome in most organisms and are essential for numerous fundamental biological processes (1). The insertion and assembly of membrane proteins is therefore of critical importance to all organisms. In Escherichia coli, the biogenesis of inner membrane proteins (IMPs) 2 is predominantly accomplished in a co-translational manner and involves three distinct steps (2): (i) membrane targeting, mediated by the signal recognition particle (SRP) and the SRP-receptor FtsY (3); (ii) insertion into the lipid bilayer by the Sec translocon, consisting of the protein-conducting channel SecYEG, the accessory complex SecDFYajC, and the ATPase SecA (4 -6); and (iii) folding and final assembly into a lipid-embedded functional structure. Although the protein-conducting channel (SecYEG) is rather well characterized, the polytopic IMP YidC from E. coli was recently identified. It plays a central and versatile role during the integration, folding, and assembly of IMPs (7)(8)(9)(10)(11)(12).
YidC associates with the Sec translocon (mainly with SecD-FYajC) and was suggested to operate downstream of the SecYEG channel, catalyzing the final anchoring of Sec-dependent substrates (e.g. FtsQ, Lep, MtlA, LacY) into the membrane and/or their folding into a physiological conformation (6,(13)(14)(15). However, YidC is present in excess over the Sec translocon (16) and has also been described as a Sec-independent membrane insertase (9). A number of IMPs utilize this alternative insertion pathway, including the small phage coat proteins M13 and Pf3 and the endogenous IMPs Foc and MscL (7,(17)(18)(19). YidC has also been shown to be involved in the targeting and translocation of lipoproteins (20). Therefore, YidC can be described as a chaperone or a channel (21,22).
YidC is homologous to Alb3 and Oxa1, involved in the integration of proteins into the thylakoid membrane of chloroplasts and the inner membrane of mitochondria, respectively (11,23). All YidC homologs contain five putative transmembrane (TM) segments that are thought to be essential for membrane protein insertion (21, 24 -26). Unique regions at the N and C termini reflect specific requirements for the interaction with other targeting and insertion factors as well as for the topology of the protein substrates. Oxa1 contains a C-terminal domain for the interaction with the ribosome (27,28). YidC from Gram-negative bacteria contains an additional N-terminal TM helix and a periplasmic domain (P1) between TM1 and TM2 (29). The P1 domain of E. coli YidC was shown to interact with SecF (30). Interestingly, deletion of TM1 and of 90% of the P1 domain does not abolish the insertase function (25). However, deletion of the C-terminal region of the P1 domain impairs cell viability and membrane insertion of a number of Sec-dependent and -independent substrates (30). The introduction of site-specific protease sites at the borders of the P1 domain results in a coldsensitive YidC mutant with altered insertase activity (31). Taken together, the role of YidC in membrane protein insertion and, specifically, the role of the P1 domain is still not well understood. To date, E. coli YidC is the best characterized member of the Oxa1/Alb3/YidC family. Only a projection structure at low resolution is available from cryo-electron microscopy that suggests that the YidC monomer is a membrane pore that might be able to associate with itself (forming homodimers) or with the Sec translocon, depending on the protein substrate (32). The structural arrangement of YidC and specifically of the P1 domain is, however, not resolved.
Here we report the crystal structure of the P1 domain of YidC from E. coli at 1.8 Å resolution. The structure provides insights into the role of the P1 domain for YidC function.

EXPERIMENTAL PROCEDURES
Structure Determination and Refinement-The cloning, overexpression, purification, and crystallization of P1D (residues 56 -329 of E. coli YidC) and its selenomethionine (SeMet) derivative were performed as described (73). The structure of P1D was determined using single wavelength anomalous diffraction data collected on a SeMet crystal at the peak wavelength ( ϭ 0.9790 Å) and the PHENIX (Python-based Hierarchical Environment for Integrated Xtallography) program suite (33,34). Details of data collection and processing are given elsewhere (73). The eight selenium sites (4 selenium/molecule; 2 molecules in the asymmetric unit) present in the asymmetric unit and the non-crystallographic symmetry relations were identified and the phases were calculated. Density modification, phase extension, and automatic model building were carried out using a remote data set collected at 1.8 Å resolution and previously scaled to the peak wavelength data using SCALEIT (35). The AUTOBUILD routine from PHENIX was able to build 473 residues (and the C␣ traces of 16 additional amino acids) for the two molecules of the asymmetric unit. The model was completed using iterative cycles of model building in Coot (36), and refinement was done with REFMAC5 (35,37). The two protein chains were refined independently. The model quality was analyzed with PROCHECK (38). The refinement statistics are summarized in supplemental Table 1.
Structure Analysis-The secondary structure was analyzed with DSSP (39); the multiple sequence alignments were prepared with ClustalW (40) with sequences retrieved from a PSI BLAST search and rendered with ESPript (41). Surface potentials were calculated with GRASP (42). Conservation scores were determined using the ConSurf server on the basis of 40 sequences of P1 domains from Gram-negative bacteria. Residues interacting with the polyethylene glycol (PEG) molecule are according to LIGPLOT analysis (43). The PEG/protein and protein/protein interactions within the crystal packing were analyzed with the PISA server (44). Figures were generated with PyMOL.
Characterization of P1D in Solution-The apparent molecular mass was analyzed by size exclusion chromatography using a Superdex 200 HR 10/30 column (GE Healthcare) in 10 mM Hepes, pH 7.5, 200 mM NaCl (buffer A) with Ϯ 10 mM Cacl 2 . The absolute molecular mass was determined by static light scattering (mini Dawn Tristar; Wyatt Technologies) and refractive index analysis (WGE Dr. Bures ⌬n1000, Dallgow, Germany). The oligomeric state of P1D in solution was determined by analytical ultracentrifugation experiments using a Beckman Optima XL-A ultracentrifuge equipped with absorbance optics and an An60 Ti rotor (Beckman Coulter, Fullerton, CA). Sedimentation velocity studies were carried out at 35,000 rpm at 20°C in three different buffers (buffer A, buffer A ϩ 1 mM CaCl 2 , and buffer A ϩ 50 mM CaCl 2 ).

RESULTS AND DISCUSSION
Overall Structure of P1D-E. coli YidC has a molecular mass of 61 kDa and consists of six transmembrane helices. The P1 domain comprises residues 24 -342 and is inserted in the periplasmic loop between TM1 and TM2 (29) (Fig. 1A). Primary structure analysis revealed that the N-terminal region of the P1 domain (residues 26 -55) contains a low complexity region predicted to be unstructured according to multiple sequence alignments and secondary structure predictions. The C-terminal region of the P1 domain could not be assigned unambiguously (see below). Therefore, two constructs of the P1 domain were expressed and purified, representing residues 56 -329 and 56 -342, respectively (73). Although the longer one suffered from aggregation problems, the shorter construct (P1D, residues 56 -329) yielded soluble protein and diffraction quality crystals (73). The crystal structure was determined at 1.8 Å resolution using single wavelength anomalous diffraction and refined using standard procedures. The refinement statistics are given in the online supplemental Table S1.
The P1D structure is well ordered (overall B-factor of 17.2 Å 2 ) with the exception of residues 204 -216, which are part of a flexible loop. P1D has an overall globular shape with approximate dimensions of 60 ϫ 50 ϫ 40 Å (Fig. 1, B and C). The major part (residues 56 -310) consists of two twisted anti-parallel ␤-sheets, S1 and S2. S1 (11-stranded ␤-sheet, major) and S2 (8-stranded ␤-sheet, minor) pack against each other and form a ␤-supersandwich fold (Structural Classification of Proteins data base) (45). The curvature of the two ␤-sheets creates a concave surface on the S1 side and a convex surface on the S2 side. A number of irregular loops as well as a 3 10 helix (1) between ␤5 and ␤6 and one ␣ helix (␣1) between ␤12 to ␤13 connect the ␤-strands. Sequence analysis using different bioinformatic tools did not allow prediction of this fold. The C terminus of the P1D structure (residues 311-329) consists of the helices ␣2 and ␣3, which are packed against the S2 layer of the ␤-supersandwich by hydrophobic interactions (see below, Fig.  6B). The helix ␣3 is truncated and forms a 3 10 helix in P1D, but it is predicted to extend to residue 338 toward TM2 in fulllength YidC (see below, Fig. 6A). Therefore, we refer to the C-terminal, helical part of P1D as the linker region. P1D is conserved in YidC of Gram-negative bacteria (Fig. 1D). More than 40 homologous sequences were identified by PSI-BLAST analysis, with sequence identities varying from 20 to 99%. Structure-based sequence alignments suggest that they all share the topology of the P1D structure with a protein core adopting a ␤-supersandwich fold and a helical linker region at the C terminus.
Analysis of Crystal Packing-YidC has been reported to exist as a monomer or dimer in vivo; however, its oligomeric state is still a matter of debate (8,23,32,46). P1D is monomeric in solution according to size exclusion chromatography, static light scattering, and sedimentation velocity experiments (data not shown). However, two molecules are present in the asymmetric unit of the crystal (molecules A and B), related by a 2-fold non-crystallographic symmetry axis. They have very similar structures with a root mean square deviation of 0.5 Å and interact by the following regions: ␣2, ␤19, the loop between ␤6 and ␤7, and the loop between ␤16 and ␤17 ( Fig. 2A). P1D interactions between adjacent asymmetric units involve different regions and are favored by calcium ions (Fig. 2). Two calcium ions are bound between the 1 helix of chain A and the 1 helix of chain B of an adjacent asymmetric unit (Fig. 2B). The three other calcium ions were observed at the interfaces between the ␤11-␤12 loops of two molecules, A and B, from two adjacent asymmetric units (Fig. 2C). Because crystals were only obtained (or stable) in the presence of calcium ions, they are likely to promote crystallization by tightening the interaction between two P1D molecules. However, calcium ions are unable to promote dimerization in solution as tested by size exclusion chromatography and analytical ultracentrifugation experiments (data not shown). The interfaces between molecules A and B in the crystal bury around 1000 Å 2 , which is in the lower range of values described for oligomeric proteins. However, the analysis of the structural and chemical properties and of the probable dissociation pattern of the different assemblies revealed that one of them is energetically favorable. In this case, the concave sides of two A and B molecules are facing each other, forming a large central cavity (Fig. 2A). The biological relevance of such an interaction needs to be investigated.
P1D Shares the Fold of Carbohydrate-binding Proteins-Analysis of the P1D structure using the Dali server (47) identified a number of proteins from the galactose mutarotase-like family as structural relatives (supplemental Table S2). Comparison of the three-dimensional structures and topologies shows that P1D is similar to the lectin-like domains found in calnexin, an endoplasmic reticulum chaperone involved in quality control of protein folding (48); in neurexin, a putative cell recognition molecule (49); and also in the carbohydrate recognition domain of p58/ERGIC-53, an animal lectin involved in glycoprotein export from the endoplasmic reticulum (50). Number and length of the ␤-strands and/or of the connectivities differ between P1D and these structures as reflected in the high root mean square deviation of ϳ3.5-4 Å, and the sequence identities are very low (Ͻ12%). However, the main characteristics of the fold are maintained: a pair of antiparallel ␤-sheets curved to form a concave and convex side (Fig. 3). The ␤-sandwich motif was reported to have two putative functions. First, in maltose or chitobiose phosphorylases, rhamnogalacturonase, glucoamylase, and galactosidase (supplemental Table S2), it participates in intramolecular interactions and was related to transglycosylation, thermostability, or fold correction (51)(52)(53)(54). Second, it is involved in sugar recognition (lectin-like domains) and plays a role in catalysis (galactose mutarotase, aldose epimerase, OpgG). Carbohydrate recognition occurs on the concave side of the domain, in a large open cleft in the lectin family, whereas in enzymes like galactose mutarotase or OpgG the cleft is narrower and more shielded by loops. The carbohydrate binding residues are not strictly conserved among the different families but typically involve a highly negatively charged surface (Fig.  3C). Acidic residues are involved in hydrogen bonding interac- tions with the substrate whereas aromatic residues participate in stacking interactions with the carbohydrate ring.
Notably, P1D exhibits a large cleft at the corresponding position (ϳ30 ϫ 10 Å, with a depth of 3-7 Å; Figs. 1C and 3). The shape and dimensions of the P1D cleft are similar to those observed in the lectin-like domain. However, the negative surface potential is not maintained in P1D (Fig. 3C) and the residues involved in carbohydrate binding are not conserved. The analysis of the P1D cleft shows a high conservation within the P1 domains of YidC (Fig. 4A) with a major contribution of hydrophobic residues (Fig. 3C). Lys-289 introduces a positive charge in the middle of the cleft. From the molecular details of the P1D cleft, it is unlikely to accommodate polysaccharides.
Although P1D shares the protein fold and the overall position and architecture of the binding pocket with its structural homologs, it seems designed for different ligands. The dimensions and electrostatic properties of the cleft make it well suited to accommodate an extended and predominantly unpolar molecule.
The P1D Cleft Contains a Ligand-The structure of P1D contains an elongated electron density in the binding cleft (Fig. 4). It was interpreted as an ordered polyethylene glycol molecule (PEG 400) that arises from the crystallization buffer. The PEG/ protein interface corresponds to a buried surface area of 807 Å 2 and is formed by 22 residues mainly from the S1 layer. The interactions are predominantly hydrophobic (including Phe- 193, Tyr-275, Asn-273, and Gln-291) except for hydrogen bonds formed between the PEG molecule and Arg-128 and Lys-289, and two hydrogen bonds mediated by water molecules (Fig. 4B). The center of the cleft and especially the two aromatic residues interacting with the PEG molecule (Phe-193 and Tyr-275) are highly conserved in the P1 domains of YidC. Positively charged residues (Lys, Arg, or Gln) are mainly found at position 289 but are replaced in some bacteria by Ile, Val, or Thr, further enhancing the hydrophobic character of the cleft. The high degree of conservation suggests that all P1 domains bind similar ligands (Figs. 1D and 4A).
PEG molecules are known to occupy clefts that naturally bind elongated polymers such as either polypeptides in chaperones (55,56), in peptide deformylase (57,58), and in the neuronal calcium sensor (59) or long acyl chains in enzymes (59 -61) and in a periplasmic lipoprotein localization factor (62). The interaction network described for these examples is similar to the one observed in the P1D structure. The PEG molecule described here could therefore mimic a natural P1D ligand. P1D may bind acyl chains of peptidoglycans or lipopolysaccharides as described for the periplasmic folding factor Skp (63). Alternatively, P1D could accommodate an elongated peptide chain from an interacting protein or from an unfolded polypeptide and therefore act as chaperone. All the Sec-independent YidC substrates identified to date contain short periplasmic tails that from their primary structure could interact with the P1D cleft (19,64,65). Other potential ligands include Secdependent YidC substrates, e.g. lipoproteins (20) or periplasmic proteins such as periplasmic folding factors (66). Because deletion of the major part of P1D corresponding to the ␤-supersandwich domain does not impair the insertase function of YidC (30), it is tempting to assign an independent function to the ␤-supersandwich region present only in Gram-negative bacteria. However, only a small number of YidC substrates are known and have been tested for P1D requirement.
Interaction of P1D with SecF-The association of E. coli YidC with the Sec translocon seems unique within the YidC/Oxa1/Alb3 family, and the P1 domain might play a role in this interaction (21). The P1 domain was shown to interact with SecF based on copurification experiments with the SecDFyajC complex (67). Residues 215-265 of the P1 domain were identified as the region of interaction by analysis of deletion mutants (30). However, the P1D structure clearly indicates that these deletions severely affect or even destroy the ␤-supersandwich fold. The proposed interaction site localizes to an exposed edge of the ␤-supersandwich fold and includes the end of the flexible loop ␤10 -␤11; helix ␣1, which is involved in a crystallographic contact; and the ␤-strands 11-15 (Fig. 5A). This region is not well conserved in YidC of Gram-negative bacteria (Fig. 5B) with the exception of ␤-strands ␤14 and ␤15, which are involved in hydrophobic interactions with the C-terminal linker region. Although the proposed interaction site localizes in the vicinity of the cleft, it does not overlap with it. Therefore, P1D might be able to interact with different binding partners independently but also simultaneously. In an effort to analyze the SecF/P1D interaction in more detail, we used one-dimensional NMR titrations and isothermal titration calorimetry experiments. However, a specific interaction of the periplasmic domain of SecF (residues 48 -140) with P1D was not observed (data not shown). If this interaction occurs in vivo it might have a low binding affinity or it might be transient as also previously suggested (30).
Membrane Interaction of the P1 Domain-The region covering residues 323-346 of E. coli YidC was shown to be essential  for cell growth and membrane insertion of both Sec-dependent and Sec-independent substrates (25,30). It locates at the C terminus of the periplasmic domain and is conserved within the YidC/Oxa1/Alb3 family (Figs. 1D and 6A). Residues 323-329 form the ␣3 helix in the P1D structure and together with the helix ␣2 pack against the ␤-supersandwich via a hydrophobic and highly conserved interface (Fig. 6B). The region including residues 330 -346 is not unambiguously defined because it is either connecting P1D to TM2 or it is already part of TM2, as suggested by the standard programs (Fig. 6A). We therefore analyzed this sequence in more detail by different prediction tools (68,69) (Fig. 6, A and C). The analysis shows that ␣3 is likely to extend to residue 338 with a hydrophobic C terminus favorable for membrane interaction. The adjacent sequence FIGURE 6. Analysis of the C-terminal linker region. A, sequence alignment and secondary structure prediction for the YidC/Oxa1/Alb3 family (E. coli YidC for Gram-negative bacteria, Bacillus subtilis SpoIIIJ for Gram-positive bacteria, Saccharomyces cerevisiae Oxa1 for mitochondria, and Arabidopsis thaliana Alb3 for chloroplast). A multiple alignment was originally generated with 160 YidC sequences. Secondary structure elements in P1D structure (YIDC_ECOLI) are indicated at the top. The cylinders represent predicted ␣ helices (PredictProtein server) (68). Potential starting points of the transmembrane segment are indicated by arrows. They were predicted using four algorithms (1, HMMTOP; 2, TopPred/TMHMM1; 3, PHD). B, interaction of the helices ␣2 and ␣3 with the ␤-supersandwich region. The hydrophobic residues in the interface are colored in magenta. C, analysis of the C-terminal part of the P1 domain by the Amphipaseek server (69). The upper panel shows the prediction score, and the lower panel represents this region as a helical wheel. D, models for the orientation of P1D toward the membrane. The position of the cleft is indicated by an arrow.
(residues 340 -355) is predicted as an amphipathic helix (Fig.  6C) that could attach or even insert into the membrane (70,71). This region is conserved in YidC homologs that do not have a P1D, suggesting an important and conserved role of this region for the function of YidC (see above). In addition, we tested whether P1D contributes to the membrane interaction. Flotation assays indicated that P1D alone is interacting with E. coli membrane lipids (data not shown) (experimental conditions as in Parlitz et al.;Ref. 72). For the proposed functions of P1D it would be interesting to know how P1D, and especially the binding cleft, orient with respect to the TM part of YidC. Based on the analyses reported here, we derived two models (Fig. 6D). In Model 1, the cleft of P1D is accessible from the membrane pore and therefore could participate in folding of YidC substrates (see above). In Model 2, the cleft is oriented more away from the membrane and could serve as an interaction site for periplasmic molecules (peptidoglycans or lipopolysaccharides) or proteins (see above). More experiments are needed to test these models.

CONCLUSIONS
The structure of P1D consists of a ␤-supersandwich fold similar to carbohydrate binding proteins and an ␣-helical linker region. P1D contains an elongated binding cleft occupied by a PEG molecule that could mimic a natural ligand. The P1D cleft could be utilized by substrate proteins according to a membrane chaperone function of YidC, which would require a location of the cleft in close proximity of the membrane pore (Fig.  6D, Model 1). The cleft could also bind periplasmic molecules or proteins pointing to an additional function (Fig. 6D, Model  2). The C-terminal part of the linker region is important for the YidC insertase function and is conserved in YidC homologs in chloroplasts and mitochondria, suggesting a functional role conserved within the YidC/Oxa1/Alb3 family. In E. coli YidC, the C-terminal region of the P1 domain might be involved in orienting P1D with respect to the membrane, probably through an amphipathic region adjacent to TM2. The structure presented here provides the basis for a detailed analysis of the periplasmic domain and its importance for the multiple functions of YidC proteins.