Molecular modeling and site-directed mutagenesis of plant chloroplast monogalactosyldiacylglycerol synthase reveal critical residues for activity.

Monogalactosyldiacylglycerol (MGDG), the major lipid of plant and algal plastids, is synthesized by MGD (or MGDG synthase), a dimeric and membrane-bound glycosyltransferase of the plastid envelope that catalyzes the transfer of a galactosyl group from a UDP-galactose donor onto a diacylglycerol acceptor. Although this enzyme is essential for biogenesis, and therefore an interesting target for herbicide design, no structural information is available. MGD monomers share sequence similarity with MURG, a bacterial glycosyltransferase catalyzing the transfer of N-acetyl-glucosamine on Lipid 1. Using the x-ray structure of Escherichia coli MURG as a template, we computed a model for the fold of Spinacia oleracea MGD. This structural prediction was supported by site-directed mutagenesis analyses. The predicted monomer architecture is a double Rossmann fold. The binding site for UDP-galactose was predicted in the cleft separating the two Rossmann folds. Two short segments of MGD (beta2-alpha2 and beta6-beta7 loops) have no counterparts in MURG, and their structure could not be determined. Combining the obtained model with phylogenetic and biochemical information, we collected evidence supporting the beta2-alpha2 loop in the N-domain as likely to be involved in diacylglycerol binding. Additionally, the monotopic insertion of MGD in one membrane leaflet of the plastid envelope occurs very likely at the level of hydrophobic amino acids of the N-terminal domain.

Evolution from a two-enzyme MGDG synthesis in cyanobacteria to the one-enzyme synthesis in algae and plants is one of the major puzzling questions to understand the history of plastids. As soon as the first MGDG synthase (csMGD1) was molecularly characterized in cucumber (Cucumis sativa), the search for potential cyanobacterial homologues was conducted (11). However, based on the primary sequences, no gene candidate for cyanobacterial galactolipid synthesis could be identified (TABLE ONE). It is possible that the MGDG synthetic machinery is phylogenetically unrelated between cyanobacteria and eukaryotes. The picture is likely more complicated, because the MGDG synthase evolution cannot be fully traced in eukaryotes either. A collection of MGDG synthase (MGD) genes is now well established in Angiosperms, molecularly characterized in spinach (Spinacia oleracea) (12), thale cress (Arabidopsis thaliana) (13), and rice (Oriza sativa) (14). TABLE ONE gives a summary of what is currently known about MGDG synthases and related enzymes involved in glycolipid syntheses. MGD orthologues have been identified in the moss Physcomitrella patens, the green algae Chlamydomonas reinhardtii and Prototheca wickerhamii, and the red algae Cyanidioschyzon merolae. In the case of glaucophytes like Cyanophora paradoxa, whose plastids preserved a cyanobacterial peptidoglycan wall, it is not clear yet if MGDG is synthesized owing to a one-enzyme or a two-enzyme process. In eukaryotes that contain complex plastids inherited from a secondary endosymbiosis (Euglenids, Chlorarachniophytes, Cryptomonads, Haptophytes, Heterokonts, Dinoflagellates, and Apicomplexa) (7), MGD orthologues could only be found in a Heterokont, the diatom Thalassiosira pseudonana. In Euglenids (Euglena gracilis) or Apicomplexans for which we have abun-* This work was supported by Ministè re de la Recherche and by Oseo Agence Nationale de la Valorisation de la Recherche Rhô ne-Alpes. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact. □ S The on-line version of this article (available at http://www.jbc.org) contains supplemental Fig. 1S. 1 Both authors contributed equally to this work. 2 To whom correspondence should be addressed. Tel.  dant genomic information (Plasmodium falciparum and Toxoplasma gondii) no MGD candidate gene could be identified using classic bioinformatic tools. Lack of MGD orthologue in these organisms is surprising, because chloroplastic galactolipid syntheses could be assessed in Euglena (15) or in Toxoplasma and Plasmodium (16). Because fold is more conserved than sequence by evolution, three-dimensional structure comparison is a powerful means to establish relatedness of proteins (17), even in the absence of sequence similarity. Some clues to comprehend galactolipid synthesis in cyanobacteria, glaucophytes, euglenids, or apicomplexans might therefore benefit from the knowledge of the molecular structure of Angiosperm MGD. Synthesis of MGDG is a key process for the biogenesis of plastid membranes, particularly for thylakoid expansion (12,13,18). MGDG is also involved in the functional integrity of the photosynthetic machinery (19). Arabidopsis mutants containing half the normal MGDG amount are consistently severely affected, with defects in chloroplast development, impairment of photosynthesis, and an overall chlorotic phenotype. MGDG is the substrate for another essential lipid, DGDG (20,21), which is exported to plasma membrane (22) and mitochondria (23) under phosphate deprivation, likely to replace missing phosphatidylcholine (24,25). In addition to plastid membranes, MGDG synthesis is therefore essential for the biogenesis of most cell membranes. Taken together, the roles played by MGDG are vital and imply that MGD enzymes are potent targets for herbicide screening (26). Therefore, the MGD three-dimensional structure would also be an important starting point to dissect the MGD molecular mechanism and orientate the rational design of herbicide candidates.
In Angiosperms, MGDG production is restricted to the two membranes of the envelope that surrounds plastids (27)(28)(29). Our current knowledge about MGDG synthase function, structure, and membrane topology is established from the enzymatic activity of purified chloroplast envelope from spinach (30,31), latter attributed to soMGD1 (12). In A. thaliana, synthesis of MGDG is catalyzed by a family of three proteins (atMGD1, atMGD2, and atMGD3) of which activity and subcellular targeting to plastid were analyzed in depth (13,18). Unfortunately, our attempts to crystallize MGD proteins from either of these models, spinach or Arabidopsis, after functional expression in Escherichia coli (26), were unsuccessful.
Glycosyltransferases have been hierarchically classified on the basis of sequence similarities and stereochemistry of the reactions (32). Despite their number and functional diversity, glycosyltransferases fall into two major protein fold superfamilies named GT-A and GT-B, respectively (32,33). This classification is constantly evolving and updated (CAZy classification available at afmb.cnrs-mrs.fr/CAZY/) (34). From sequence alignments and fold recognition, MGD enzymes are members of the GT-B superfamily. The closest homologues of MGD proteins were found to be MURG glycosyltransferases (11), with whom they are classified in the GT-28 family of the CAZy systematics. To date the GT-28 family does only contain one three-dimensional structure, i.e. that of E. coli MURG (ecMURG) (35,36). MURG catalyzes the last intracellular step in bacterial and cyanobacterial peptidoglycan biosynthesis, i.e. transfer of an N-acetyl-␤-D-glucosaminyl from a UDP-␣-D-N-acetyl-glucosamine (UDP-GlcNAc) donor onto lipid 1 (Table I). For both MGD and MURG activities, a ␤-glycosyl moiety is transferred from a UDP-␣-sugar donor to a hydrophobic acceptor, after an (␣3␤) inversion of the anomeric configuration of the sugar (Table I). It is therefore tempting to suppose that MGD from algae and plants derived from a MURG sequence of the ancestral symbiotic cyanobacteria.
Here, we used the ecMURG structure as a template for secondary structure comparison and fold prediction. This approach was combined with enzymological data and phylogenetical comparisons to deduce molecular models for spinach soMGD1 and Arabidopsis atMGD1, atMGD2, and atMGD3, focusing particularly on the active site. The soMGD1 model was then challenged and sustained by site-directed mutagenesis analyses.
Wild-type soMGD1 Expression Vector and Site-directed Mutagenesis via PCR-The soMGD1 sequence used in this study was inserted in NdeI-BamHI cloning site of the pET-y3a vector (12). The full-length wild-type (WT) soMGD1 refers to the coding sequence truncated of its predicted chloroplastic transit peptides (12), i.e. from leucine 99 to alanine 522. Mutations were introduced into the cloned soMGD1 by using the QuikChange site-directed mutagenesis kit (Stratagene). Mutations were confirmed by sequencing (Genome Express, Grenoble).
Recombinant Wild-type and Mutated soMGD1 Functional Expression in E. coli at 28°C -Isolated colonies of transfected E. coli (BL21-DE3) were inoculated in LB medium (2.5 ml, 100 g/ml carbenicillin) and grown at 37°C. When A 600 reached 0.5, the cell suspension was transferred to 15 ml of LB medium (100 g/ml carbenicillin) and grown at 37°C until A 600 reached 0.5. Cells were then transferred to 400 ml of LB medium (100 g/ml carbenicillin) and grown until A 600 reached 0.5. Isopropylthio-␤-D-galactopyranoside (0.4 mM) was subsequently added to induce soMGD1 expression, and the suspension was incubated at 28°C for 4 h. Cells were harvested by centrifugation and stored at Ϫ20°C.
Detergent Solubilization of Wild-type and Mutated soMGD1 and Purification by Hydroxyapatite Chromatography -All the operations were carried out at 4°C. After expression of soMGD1, bacteria pellets (1 mg of proteins) were incubated for 30 min in 1 ml of medium A (6 mM CHAPS, 1 mM dithiothreitol, 10 mM MOPS-KOH, pH 7.8) containing 50 mM KH 2 PO 4 /K 2 HPO 4 . The mixture was centrifuged for 30 min at 7,000 ϫ g. The pellets containing inclusion bodies of improperly folded and inactive soMGD1 polypeptides (26) were discarded. The supernatants, containing soMGD1 enzymes extracted from bacterial membranes by CHAPS and solubilized in mixed micelles (12,13,26), were loaded onto the top of a 5-ϫ 15-mm column containing 500 l of a Hydroxyapatite-Ultrogel (LKD) matrix equilibrated with buffer A, at a 1 ml/min flow rate (37). The column was washed with 2 ml of buffer A. The matrix-bound soMGD1 wild-type and mutated proteins were eluted with 2 ml of 500 mM KH 2 PO 4 /K 2 HPO 4 in buffer A. Fractions were collected and used for enzymatic assays and protein determination.
MGDG Synthase Enzymatic Assay-Enzyme activity was assayed in mixed micelles at 25°C (30). Phosphatidylglycerol (1.3 mM) and DAG (160 M) dissolved in chloroform were first introduced into glass tubes. After evaporation of the solvent under a stream of argon, 300 l of incubation medium containing 4.5 mM CHAPS, 1 mM dithiothreitol, 250 mM KCl, 250 mM KH 2 PO 4 /K 2 HPO 4 , and 10 mM MOPS-KOH, pH 7.8, and purified soMGD1 were added. The mixture was mixed vigorously and kept 30 min at 25°C for equilibration of mixed micelles. The reaction was then started by addition of 1 mM UDP-[ 14 C]galactose (37 Bq.mol Ϫ1 ) and stopped by addition of chloroform/methanol (1:2, v/v). The lipids were subsequently extracted (38), and the radioactivity of the 14 C-labeled MGDG ultimately produced was determined by liquid scintillation counting. Activity is expressed in micromoles of incorporated galactose.h Ϫ1 .mg protein Ϫ1 .
Phylogenetic Reconstruction-MGD and MURG phylogeny was inferred from the 22 sequences described under "Materials and Methods." Distance between sequences was computed using the TULIP methods (39,40). Alignment was achieved with the Smith-Waterman method and the BLOSUM 62 scoring matrix, using the Biofacet package from Gene-IT, France (41). We computed estimated z-scores with 2000 sequence shuffling (39,42). Distance between sequences was then calculated using the z-score matrix (40). Tree topology was generated using the Neighbor-Joining algorithm (43).
Homology Modeling-Homology modeling of soMGD1 was based on the crystal structure of ecMURG (36) using sequence alignment described in this report. ␣-Helix and ␤-strand distributions in the MGD sequences were predicted with several secondary structure prediction servers (www.sbg.bio.ic.ac.uk/ϳ3dpssm/ and www.npsa-pbil.ibcp.fr/ cgi-bin) (44,45). Hydrophobic cluster analysis method was used to refine the sequence alignment (46). Hydrophobic cluster analysis is a graphical method based on the detection and comparison of hydrophobic clusters that are presumed to correspond to the regular secondary structure elements constituting the architecture of globular proteins. Structurally conserved regions, corresponding to secondary structure elements, were built with the COMPOSER homology modeling module (47) of the software Sybyl (Tripos Inc., St. Louis, MO). Loops were subsequently modeled from a general non-redundant protein fold data base in COMPOSER. Two protein segments (Asp 69 -Asn 80 and Thr 296 -Ile 305 in the soMGD1 sequence, called ͗␤2-␣2͘ and͗␤6 -␤7͘ loops) could not be modeled and were not considered in the final model. Minor steric conflicts and local conformational problems were analyzed with PROCHECK (48), and local optimization was performed in several cycles. The final model includes hydrogen atoms and partial atomic charges derived using the Pullman procedure.
Docking of Nucleotide Sugar in the Binding Site-The UDP-Gal molecular structure was docked into the proposed active site of soMGD1 in an orientation and conformation similar to that of UDP-GlcNAc in the ecMURG crystal structure (36). Several cycles of energy minimization were performed to optimize both the geometry of the ligand and the interacting protein side chains in the binding site and its vicinity. Energy calculations were performed using the Tripos force field (49) in the Sybyl package together with energy parameters derived for carbohydrates (50) and for nucleotide sugars (51).

Homology of MGD Sequences throughout Evolution-
The fully sequenced genome of A. thaliana contains three MGD proteins, atMGD1, atMGD2, and atMGD3 (18). These isoforms could be phylogenetically grouped into two types, the A-type (atMGD1) and the B-type (atMGD2 and atMGD3), having different substrate specificities with regard to DAG molecular species and different physiological functions. Awai et al. (13) showed that non-conserved residues could serve as signatures for A-and B-types. On a primary sequence basis, A-type is characterized by a canonical chloroplast transit peptide of ϳ100 amino acids, whereas the B-type exhibits a shorter addressing sequence of ϳ30 residues. Numerous full-length and cloned A-type MGD sequences are characterized in other Angiosperms (in S. oleracea, soMGD1; C. sativa, csMGD1; N. tabacum, ntMGD1; and G. max, gmMGD1), probably because this type is devoted to thylakoid biogenesis and is consistently highly expressed. By contrast, Awai et al. (13) could only trace B-types in gene fragments from unfinished plant genomes. It is now possible to describe in the sequenced genome of another plant model, i.e. O. sativa (rice), a second example of a multigenic family comprising an A-type (osMGD1) and a B-type (osMGD2).
We sought the occurrence of MGDG synthases in other groups and identified MGD sequences in the moss P. patens, ppMGD (full-length); the green algae C. reinhardtii, crMGD (fragment) and P. wickerhamii, pwMGD (fragment); and the red alga C. merolae, cmMGD (full-length). In the complete genome sequence of C. merolae, only one MGD gene is predicted. In the diatom T. pseudonana, three different MGD sequences could be detected, tpMGD1 (full-length), tpMGD2 (fragment), and tpMGD3 (fragment). In diatoms, plastids derive from a secondary endosymbiosis (7) and are surrounded by three membranes, the outermost being connected to the endomembranes (reticulum). Consistently, the full-length tpMGD1 exhibits a bipartite N terminus with a predicted signal peptide (targeting to endomembranes) upstream from a chloroplast transit peptide (targeting to chloroplast envelope membranes). Because we do not know the complete sequences of tpMGD2 and tpMGD3, the precise subcellular localization of tpMGD1, tpMGD2, and tpMGD3 and their physiological functions, the numbering of tpMGD sequences was set arbitrarily and is not related to Angiosperm A-or B-types.
Multiple alignments built with MGD sequences from distant species (see supplementary Fig. 1) highlight the residues conserved throughout evolution. No amino acid conservation is detected in N-terminal chloroplast-addressing sequences. Rather strong conservational pressure is noticed in most of the mature proteins, with highly conserved domains such as two G-loops (G-loop 1, SDTGGGHRASA, and G-loop 2, VLXXGGG(E/D)GXG). In our attempts to predict a MGD structure, we used the residue conservation profile observed in eukaryotic MGD sequences to identify amino acids that might be likely important for catalysis, particularly in the N-domain of the protein that diverges from the N-domain of MURG and that is likely involved in acceptor binding (see below).
Comparative Analysis of the Sequence and Substrate Evolution in MGD and MURG Phylogeny- Fig. 1 shows the phylogenetic tree built with ten MGD mature proteins identified in plants and protists and eight MURG sequences sampled in bacteria and cyanobacteria. Phylogeny was reconstructed using the TULIP tree method (39,40). This tree recovers the A-and B-types MGD from Angiosperms in two clusters. The Bryophyte sequence (ppMGD) shares features with Angiosperm A-type. When computed with partial sequences from C. reinhardtii (crMGD) and P. wickerhamii (pwMGD), the green algae MGD proteins branch with the Angiosperms plus Bryophyte clade (not shown), indicating the close relationship between MGD of green algae and plants (Viridiplantae), in a "green lineage" subgroup. The red algal sequence from C. merolae (cmMGD) is set between Viridiplantae and Heterokonts, consistently with the origin of Heterokonts plastids deriving from a secondary endosymbiosis with a red alga (7). Thus, the phylogeny of MGD comprises a second subgroup, corresponding to the "red lineage" (Fig. 1). This red lineage subgroup of MGD sequences is more closely related to MURG sequences (Fig. 1).
In our study, we sought to connect structural and functional information from ecMURG with functional information from soMGD1 to deduce structural features for this enzyme. Discontinuities between MURG and red lineage MGD and between red lineage and green lineage MGD might provide precious information, particularly regarding the acceptor site. In addition to the acceptor discontinuity that characterizes MURG (Lipid 1) and MGD (DAG), there is a subtle discontinuity in the acyl-species used to synthesize MGDG in plastids from chloroplasts (plants and green algae) and rhodoplasts (red algae), i.e. DAG of C16-C18 acyl-length in plants (52) and green algae (53)(54)(55) and DAG of C16-C18-C20 acyl-length in red algae (55,56). In Angiosperms, Awai et al. (13) had shown that A-and B-types also had different enzymological specificities regarding DAG molecular species, i.e. B-type enzymes exhibited a higher affinity for C18:2/C18:2 DAG than for 18:1/16:0, whereas A-type MGD did not exhibit different selectivity for these substrates. Looking back at the multiple alignment of MGD sequences (supplementary Fig. 1S), we sought therefore regions that might correlate with such phylogenic and acceptor discontinuities. Close to G-loop 1, we notice a DXWX(E/D)XXXWP segment of ϳ10 amino acids in Angiosperms and Bryophyte (supplemental Fig. 1) that shows a DXWKEYXGWP profile in Angiosperm B-type, a DLWX(E/D)HT-PWP profile in Angiosperm A-type, and a divergent content in red algae and Heterokonts and that appears as an extra domain when compared with MURG (see Fig. 2, amino acid segment called ͗␤2-␣2͘ loop, see below).
MGD Fold Prediction Using ecMURG as Template-The x-ray structure of ecMURG free enzyme (35) or in complex with its UDP-GlcNAc substrate (36) consists of two distinct domains similar in size, a N-domain and a C-domain, each containing a three layer ␣/␤/␣ sandwich typical of a Rossmann fold. The secondary structure elements in the N-domain are numbered b1/␣1/␤2 . . . ␣5/␤6, and those in the C-domain ␤Ј1/␣Ј1/␤Ј2 . . . ␣Ј5/␤Ј6 (Fig. 2). Classic Rossmann folds contain conserved glycine-rich motifs, with the consensus GXGXXG.    (40). The obtained phylogenetic tree shows related groups corresponding to MGD sequences from primary plastids of green lineage (green), MGD sequences from primary and secondary plastids of red lineage (red), and MURG sequences (gray). The MURG/MGD phylogenic discontinuity correlates with a functional discontinuity for substrates (Lipid 1/DAG and UDP-GlcNac/UDP-Gal) and the red lineage/green lineage discontinuity correlates with a functional discontinuity in used DAG molecular species (DAG of C16-C18 acyllength/DAG of C16-C18-C20 acyl-length). In Angiosperms, the type A and B clusters also correspond to differences in enzyme specificity for DAG molecular species. MGD and MURG segments, for which local similarity profiles correlate with these discontinuities, were analyzed as possible sites for DAG binding (see supplemental Fig. 1S). Fig. 2 shows the multiple alignment of E. coli ecMURG, A. thaliana atMGD1, atMGD2, and atMGD3, and S. oleracea soMGD1, obtained with ClustalW and manually refined using hydrophobic cluster analysis method so as to optimize the alignment of putative secondary structure elements (46). The different MGD sequences are 60 -70% identical. Despite a low percentage of identity with MURG (ϳ20%), the alignment shows a strong conservation of secondary structures (␣-helices and ␤-strands) (Fig. 2). In particular, Rossmann folds of the N-and C-domains and the corresponding G-loop 1 and G-loop 2 are strictly conserved. Two important insertions are predicted in the N-domain of MGD, between ␤2 and ␣2 (the ͗␤2-␣2͘ loop) and between ␤6 and ␤7 (the ͗␤6 -␤7͘ loop). Among residues of the C-domain of ecMURG involved in the recognition of the UDP sugar, most are conserved in aligned MGD sequences. In particular, in the ecMURG/soMGD1aligned residues, we listed the Arg 164 /Arg 313 and Glu 269 /Glu 427 conservation at the positions involved in ribose recognition in MURG, Phe 244 / Phe 402 and Ile 245 /Val 403 at the uracil recognition position, Thr 266 /Thr 424 at the phosphate recognition position, and Gln 288 /Gln 444 and Asn 292 / Asn 448 at the position involved in the GlcNAc recognition in MURG (Fig. 2). Interestingly, the Gln 289 at the position involved in GlcNAc recognition in MURG, is not conserved in MGD and is substituted by a glutamate, Glu 445 that may be important in the specificity for galactose. Differences are also detected in G-loop 2 at the Ser 192 /Gly 345 and Gln 193 /Glu 346 positions in the ecMURG/soMGD1 alignment. G-loop 2 is known to bind phosphate. Both G-loop 1 and G-loop 2 cover the donor binding site of MURG, and the observed substitution might be also important for appropriate binding of substrates in MGD.
The multiple alignment (Fig. 2) was a basis for building a threedimensional model of soMGD1 based on the crystal structure of ecM-URG. The resulting model is shown in Fig. 3. The soMGD1 structure is built as a double Rossmann fold, with a catalytic site inside the cleft separating N-and C-domains (Fig. 3). The prediction is interrupted in the N-domain at the level the ͗␤2-␣2͘ and ͗␤6 -␤7͘ regions, two loops that do not have corresponding sequences in MGD and could not be safely built. The roots for the ͗␤6 -␤7͘ region are oriented toward the floor of the double Rossmann fold. By contrast, the roots for the ͗␤2-␣2͘ loop head toward the catalytic site. This orientation, combined with the correlation of the ͗␤2-␣2͘ amino acid profile with phylogeny and DAG molecular species specificity, suggests that the ͗␤2-␣2͘ loop might be involved in DAG recognition.
Structural Prediction of the Substrate Binding Pocket in soMGD1-The UDP-Gal binding site was predicted using structural information from the ecMURG:UDP-GlcNAc x-ray crystal structure (36). The donor binding site is located in the conserved proline-and glycine-rich regions within the cleft separating the N-and C-domains, on the C-domain side (Fig. 4, A and B). G-loop 1 and G-loop 2 erect symmetrically above the substrate binding pocket (Fig.  4C). Deep inside the cleft, the aromatic ring of a phenylalanine, Phe 402 , is stacked on the uracil and contacts it closely (Fig. 4, Band C). The uracil is further held in place by contact from the N(3) and the O(4) atoms to the backbone of Val 403 . Ribose interacts with Arg 313 and Glu 427 . The galactosyl moiety interacts with 2 amino acids of the ␣Ј5 helix: the acidic group of Glu 445 is hydrogenbonded to O(3), whereas the NH 2 of Gln 444 bridges both O(3) and O(4). Hydrophobic contacts are also established between the methane hydrogens at C(4) and C(6) that interact with Pro 422 (Fig. 4B). Because of the divergence between MURG and MGD in the N-domain, where the acceptor is believed to bind, no prediction was achieved concerning DAG binding. In our model we observe enrichment in hydrophobic amino acids close to the galactose binding site that may be important for DAG docking (data not shown).
In enzymological analyzes, MGD activity was lost when the enzyme was incubated with lysine reagents (citraconic anhydride or tert-bu-toxycarbonyl-L-methioninehydrosuccidimidyl ester). MGD was protected from lysine-blocking agents by DAG, indicating that one or more important lysine residues were localized in the vicinity of the acceptor site (31). By substrate protection experiments, Maréchal et al. (31) further showed that one or more key histidine and cysteine residues were present in the vicinity of the DAG and UDP-Gal binding sites. Fig. 4C highlights the cysteine, histidine, and lysine residues predicted in the vicinity of the MGD catalytic site, i.e. Cys 272 , His 240 , His 245 , and Lys 419 and to a lesser extent a more remote Cys 378 .
MGD Site-directed Mutagenesis and Functional Assay-We mutated recombinant soMGD1 via polymerase chain reaction, in most cases by replacing residues by a small amino acid (WT 3 Ala) or replacing conserved acidic residues by amines (Asp 3 Asn, Glu 3 Gln, and Asp/Glu 3 Asn), or basic amino acids by an acidic residue (Lys 3 Glu and Arg 3 Glu). In one mutant, the GGGEG segment of G-loop 2 was deleted (G-LOOP mutant). Wild type and mutants were expressed in E. coli after isopropyl-␤-D-thiogalactopyranoside induction. Analysis of the total proteins from each strain showed the accumulation of a 45-kDa polypeptide (solid arrow, Fig. 5A) corresponding to the monomeric subunit of soMGD1 (12). The G-LOOP soMGD1 mutant had a lower apparent molecular weight (white arrow, Fig. 5A). No soMGD1 protein could be detected in non-induced controls (control, Fig. 5A). Production levels of WT and point-mutated soMGD1 polypeptides were equivalent, whereas an overaccumulation of the G-LOOP mutant was observed. In recombinant E. coli, polypeptides corresponding to soMGD1 are either targeted to the bacterial membranes (0.1%) where the enzyme is active, or massively routed to inclusion bodies as inactive G-LOOP deletion prevented any proper enzyme solubilization by CHAPS, thus suggesting a role of this region in accurate protein folding. For this mutant, the activity was measured in crude bacteria extracts (Fig. 5B). Based on activity, three classes of mutants were observed. First, C378A mutant, targeting a Cys in the vicinity of the substrate binding region was only partially affected, with ϳ75% of the WT-specific activity (Fig. 5B). Likewise, soMGD1 substituted at the level of Cys 284 , Cys 286 , and Cys 415 did not show any altered activity in crude bacteria extracts (not shown) and were not analyzed further after hydroxyapatite purification. Second, the W171A, E173N, W177A, R380E, N448A, E346A, H245A, and F402A were deeply affected with only 5-15% of the WT activity. In the third group comprising E427A, E445A, K419A, H240A, and the G-LOOP soMGD1, the activity was utterly abolished (Fig. 5B). The complete absence of activity in the G-LOOP mutant is not surprising, knowing the deleterious impact the G-loop deletion can produce on the three-dimensional structure. In a more refined way, a single point mutation occurring within G-loop 2, at the Glu 346 position (E346A) was sufficient to tune down the activity. From these results, the activity decreases obtained after mutation of most sites suspected of being important for MGD, sustained the predicted model presented here.
To check our hypothesis on the role of the ͗␤2-␣2͘ region in acceptor binding, kinetic parameters were determined for the native enzyme and mutant W171A. To that purpose, WT and W171A soMGD1 proteins expressed in E. coli membranes were extracted and solubilized with CHAPS (12,13,26) and further purified by hydroxyapatite-agarose chromatography (37). The purified fractions are delipidated; it is therefore possible to control the DAG and UDP-Gal availability and determine the corresponding K m according to the surface dilution model (30,59). Under this experimental procedure, the K m measured for the recombinant WT protein in respect to UDP-Gal was ϳ6-fold higher than that measured for the chloroplast-purified enzyme (30) and was not affected by the mutation (for WT soMGD1, K mUDP-gal ϭ 0.650 Ϯ 0.146 mM; for W171A soMGD1, K mUDP-gal ϭ 0.565 Ϯ 0.565 mM). By contrast, a 2-fold increase in the K m value for DAG was observed (for the WT, K mDAG ϭ 0.0120 Ϯ 0.0028 mol-fraction; for the W171A mutant, K mDAG ϭ 0.0225 Ϯ 0.0035 mol-fraction). The effect of the W171A mutation on the affinity for DAG supports a possible role of the ͗␤2-␣2͘ loop in the enzyme interaction with DAG.
Surface Analysis of soMGD1-Topological studies previously demonstrated the association of soMGD1 with envelope membranes (12). Using the present model, membrane association can be sought by surface analysis. Surface hydrophobic patches can be evidenced in the N-domain as clearly shown in Fig. 3B. Such hydrophobic regions that are found on both sides of the N-domain are expected to interact strongly with the envelope membrane.

DISCUSSION
Before ecMURG was crystallized, providing the precious structural information used in this study, biochemical analyses had given clues on MGD catalytic site, membrane topology, and oligomerization (12,31). Using mixed micelles containing DAG, it was shown that the MGDG synthase activity from spinach chloroplast envelope was a sequential, either random or ordered, bireactant system, in which the binding of one substrate did not significantly change the specificity for the cosubstrate. It was therefore possible to estimate the K m values for UDPgalactose and various DAG species, indicating that the enzyme was able to discriminate between DAG molecular species (31). The inhibition by UDP that was competitive in regard to UDP-Gal and non-competitive in regard to DAG, supported the existence of separate sites for each substrate. Inactivation by amino acid reagents and protection by substrates indicated the existence of key lysine, histidine, and cysteine residues in the vicinity of the catalytic site (31). The occurrence of ϳ10 cysteine, ϳ15 histidine, and ϳ35 lysine residues in MGD sequences did not allow any simple identification of those that were most likely in the vicinity of the substrate binding sites. Sensitivity to a hydrophobic chelating agent o-phenanthroline and activity recovery after addition of bivalent metal cations supported the existence of one or more metal associated to the enzyme (26,31). Using antibodies raised against soMGD1, the MGDG synthase from spinach was shown associated to membranes in a NaCl-resistant but NaOH-sensitive manner, indicating that the enzyme had no transmembrane span and was rather a "monotopic" enzyme, embedded in one membrane leaflet (12). Eventually, the kinetic of soMGD1 inactivation after ␥-radiation was consistent with a functional dimer (12). In the present study, we combined the biochemical data on MGD catalytic site with the structural information learned from MURG to refine the predicted soMGD1 model.
MDG Overall Fold-Based on sequence comparisons (Figs. 1 and 2), the architecture of soMGD1 predicted in this study (Figs. 3 and 4) was B, galactosylation activity assays of the purified WT and mutated soMGD1 proteins. Expressed WT and mutated soMGD1 proteins were extracted from bacterial membranes and solubilized with CHAPS, a zwitterionic detergent, and subsequently purified by chromatography on hydroxyapatite-agarose as described under "Materials and Methods." Specific activity is given in micromoles of galactose incorporated per hour per mg protein.
sustained by PCR-point mutation analyses (Fig. 5). It consists of two Rossmann folds called the N-and C-domains. Catalytic site is predicted in the cleft between the two domains. The C-domain is the most satisfactorily predicted, with identification and confirmation by point mutation of key residues for UDP-Gal binding. In the N-domain, prediction is incomplete, lacking the two ͗␤2-␣2͘ and ͗␤6 -␤7͘ loops that have no counterpart in ecMURG. From phylogenetic comparisons (Fig. 1), we deduced that the ͗␤2-␣2͘ loop, which heads toward the catalytic site, may be involved in catalysis as well.
Structural Homology Reveals the UDP-Gal Binding Site-The UDP-Gal binding site was predicted using the conformation of the UDP-GlcNAc:ecMURG complex (36) (Fig. 4) and was supported by sitedirected mutagenesis experiments in which identified residues proved to be important for soMGD1 activity (Fig. 5). The overall orientation of the nucleotide sugar is conserved after optimization (Figs. 3A, 4A, and 4B). The predicted orientation of UDP in the binding site of soMGD1 involved the same network of hydrogen bonds and stacking contacts as that determined with UDP-GlcNAc in ecMURG. Hydrogen bonds with Gln 444 and Glu 445 are conserved (Gln 288 and Gln 289 in MurG). Some differences in binding and orientation of the glucoside moiety in MURG and the galactoside moiety in MGD1 were observed. In particular, specificity for galactose might derive from the proline residue at position 422 in MGD1. This bulky amino acid does not allow the presence of an equatorial hydroxyl group at position 4 but favors the hydrophobic contact with the methane hydrogen at this position. In MURG, the equivalent amino acid is a less bulky alanine residue that allows equatorial orientation of the O(4) atom establishing an hydrogen bond with this residue backbone.
Proposed Residues of the ͗␤2-␣2͘ Loop Involved in the DAG Binding Site-No conclusive structure could be deduced for the acceptor binding site. The enrichment in hydrophobic amino acids close to the UDP-Gal binding site, as compared with MURG, might reflect the necessary environment for DAG, whose hydrophobicity is much higher than that of Lipid 1. The structure of the ͗␤2-␣2͘ loop could not be determined, but its roots head in the direction of the substrate binding pocket (Fig.  4C). That loop is rich in aromatic amino acids that are candidates for hydrophobic interactions and stacking. A possible role for that loop might be to maintain the acceptor or to discriminate between different acceptor species. Point mutation analyses support the importance of Trp 171 , Glu 173 , and Trp 177 for activity (Fig. 5). The enzymological analysis of soMGD1 mutated at the level of Trp 171 was undergone after membrane extraction, solubilization in detergent-lipid mixed micelles, purification, and K m computation following the surface dilution model. The overall decrease in specific activity measured with the W171A mutant could only be attributed to a decrease of the affinity for the provided DAG (in this experiment, containing only dioleoyl-1,2-snglycerol molecular species) supporting a possible role of the ͗␤2-␣2͘ loop in the enzyme interaction with DAG. The ͗␤2-␣2͘ loop seems therefore an important component of a DAG binding region, possibly in conjunction with remote residues of the soMGD1 sequence.
Key Histidine and Lysine Residues in the Vicinity of the Active Site-We identified lysine and histidine residues in the vicinity of the broad substrate binding zone that proved essential for activity (Fig. 5). These amino acids were highly conserved in MGD sequences (supplemental Fig. 1S). Mutation of His 240 , His 245 , and Lys 419 totally abolished the enzyme activity thus supporting theirroleincatalysis.Becausetheseresiduesdonotappeartobeinclosecontact with UDP-Gal, a speculative hypothesis would be that these residues might be involved in acceptor (DAG) binding. In this study, no cysteine residue could be identified as clearly essential for activity. In particular, cysteine residues in two intriguing CXC motifs (CYC in the ␤6 and CDC in the ␤Ј4 strands) were mutated (at the level of Cys 284 , Cys 286 , and Cys 415 ), without any noticeable effect (not shown).
Proposed Membrane Association Domain-Previous sequence analyses tempted to explain the monotopic insertion of soMGD1 inside one leaflet of the inner envelope membrane by detected amphipathic ␣-helices (12). Surface hydrophobic patches have been clearly evidenced in the N-domain (Fig. 3B). Such hydrophobic regions that are found on both sides of the N-domain are expected to interact strongly with the envelope membrane. It is striking to note that similar hydrophobic patches have been described in ecMURG that is also acting at the membrane surface (35). The cleft region from each MGD monomer (in the vicinity of which we predicted both the UDP-Gal binding site and the roots of the ͗␤2-␣2͘ loop) is likely to face the membrane surface where the very hydrophobic DAG resides and protrude sufficiently to bind the hydrophilic UDP-Gal sugar donor.
Proposed MGD Dimerization Domain-To our knowledge, there is no report on a possible MURG dimerization as supported for MGD by radiation inactivation kinetics and cross-linking experiments (12,26). The only described organization of two MURG proteins is in x-ray crystal where two protein molecules are observed in the asymmetric unit (35,36). In the MGD model presented here, membrane association seems likely localized in the N-domain. Based on this membrane topology, on the geometric constraints derived from the DAG origin in the membrane and the UDP-Gal origin in the aqueous phase, it may be reasonable to suppose a dimerization involving the protruding C-domain. More investigations are required to explore this hypothesis. An additional question is also the possibility of a dimerization of MURG.
Question of the Association of MGD with Metal Cations-Association of apo-MGD with metals is experimentally supported by sensitivity of the native and recombinant enzymes to ortho-phenanthroline, a hydrophobic chelating agent, and the subsequent partial recovery of the activity when supplied with metals such as Zn 2ϩ , Mg 2ϩ , or Cu 2ϩ (26,31). Metal coordination might imply cysteine and/or histidine residues. It is still not clear whether metal coordination might be directly involved in catalysis, stability of the overall structure, and/or MGD dimerization. The present study does not provide sufficient data to support any hypothetical metal binding site(s).

CONCLUSION
Attempts to crystallize and solve the structure of plant MGDG synthases, the enzymes producing the most abundant polar lipid on earth, were unsuccessful until now. Using MURG as a template, and challenging our predictions by site-directed mutagenesis, we could obtain the overall fold of an Angiosperm MGDG synthase. As expected for a glycosyltransferase of the GT-B superfamily, the architecture of a monomer was that of a double Rossmann fold. The binding site for the UDP-Gal sugar donor was best predicted. The structure of two short segments (͗␤2-␣2͘ and ͗␤6 -␤7͘ loops) that have no counterparts in MURG could not be determined. Combining the obtained model with phylogenetic and biochemical information, we extended our structural investigation to structural questions that are unique to MGD proteins, that is binding of DAG, insertion into membranes, sites for dimerization, and association to a metal. In particular, we collected evidence supporting that the ͗␤2-␣2͘ loop in the N-domain is very likely involved in DAG binding. Additionally, the monotopic insertion of MGD in one membrane leaflet of the plastid envelope occurs very likely at the level of hydrophobic amino acids superficially exposed in the N-domain. Based on these topological constraints, MGD dimerization would occur in the protruding C-domain. We could not identify any site dedicated to metal association. Future prospects include a refinement of the MGD model, focusing on regions of MGD monomers, which likely play important functional roles, i.e. the ͗␤2-␣2͘ loop and parts of the C-domain that may be involved in protein dimerization. Based on MGD and MURG sequence similarity, the question of a possible dimerization (and association to metal) of MURG should also be addressed. Eventually, the presented structure will be a starting model to investigate possible molecular pharmacological mechanisms of MGD inhibitors and to understand the difficult question of the evolution of MGDG-synthesizing enzymes.