Structure of transmembrane prolyl 4-hydroxylase reveals unique organization of EF and dioxygenase domains

Prolyl 4-hydroxylases (P4Hs) catalyze post-translational hydroxylation of peptidyl proline residues. In addition to collagen P4Hs and hypoxia-inducible factor P4Hs, a third P4H—the poorly characterized endoplasmic reticulum–localized transmembrane prolyl 4-hydroxylase (P4H-TM)—is found in animals. P4H-TM variants are associated with the familiar neurological HIDEA syndrome, but how these variants might contribute to disease is unknown. Here, we explored this question in a structural and functional analysis of soluble human P4H-TM. The crystal structure revealed an EF domain with two Ca2+-binding motifs inserted within the catalytic domain. A substrate-binding groove was formed between the EF domain and the conserved core of the catalytic domain. The proximity of the EF domain to the active site suggests that Ca2+ binding is relevant to the catalytic activity. Functional analysis demonstrated that Ca2+-binding affinity of P4H-TM is within the range of physiological Ca2+ concentration in the endoplasmic reticulum. P4H-TM was found both as a monomer and a dimer in the solution, but the monomer–dimer equilibrium was not regulated by Ca2+. The catalytic site contained bound Fe2+ and N-oxalylglycine, which is an analogue of the cosubstrate 2-oxoglutarate. Comparison with homologous P4H structures complexed with peptide substrates showed that the substrate-interacting residues and the lid structure that folds over the substrate are conserved in P4H-TM, whereas the extensive loop structures that surround the substrate-binding groove, generating a negative surface potential, are different. Analysis of the structure suggests that the HIDEA variants cause loss of P4H-TM function. In conclusion, P4H-TM shares key structural elements with other P4Hs while having a unique EF domain.

Transmembrane prolyl 4-hydroxylase (P4H-TM) is considered to be the fourth HIF-P4H (12,13). It is located at the ER membrane with the catalytic domain inside the ER lumen (13). P4H-TM sequence resembles more closely the C-P4Hs than the HIF-P4Hs (Fig. 2), but instead of procollagen, it was found to hydroxylate HIFα in vitro and to downregulate HIFα in cell culture (12,13). Two putative EF-hand motifs were detected in the P4H-TM sequence N-terminal to the catalytic domain (Figs. 1B and 2) (12). The Ca 2+ -binding EF-hand motifs that were first identified in parvalbumin by Kretsinger and Nockolds (14) are around 30-residue-long helix-loop-helix structures that usually occur in pairs. Ca 2+ binding by seven oxygen groups within the EF-hand loop region modulates the relative orientation of the two helices (15,16). The EF hand containing proteins can function as calcium sensors, generating biochemical responses to changes in cellular calcium concentration, or as calcium buffers, binding free cellular calcium to modulate cellular signaling (15).
P4H-TM is highly expressed in the brain and eye and moderately expressed in the skeletal muscle, lung, heart, adrenal gland, and kidney (13,17,18). Expression has also been reported in the prostate, testis, and thyroid (13). Morpholino KO of P4H-TM in zebrafish embryos resulted in basement membrane defects, impaired eye development, and compromised kidney function (17). In mice, P4H-TM is involved in regulation of erythropoietin levels and erythrocytosis (19). P4h-tm −/− mice develop early-onset aging-associated retinal and renal dysfunction (18), and their behavioral phenotype is characterized by hyperactivity and a dramatic reduction of despair response (20).
Variants of human P4HTM have been linked to a severe disability, the HIDEA syndrome, characterized by intellectual disability, hypotonia, eye abnormalities, hypoventilation, obstructive and central sleep apnea, and dysautonomia (21,22). Exome sequencing revealed five different homozygous or compound heterozygous pathogenic biallelic P4HTM variants in patients from five families from across the world (21,22). Two of the variants lead to premature stop codons, and the remaining three resulted in insoluble protein products, suggesting the disease is linked to the loss of P4H-TM function (22).
Although previous results show that P4H-TM hydroxylates HIF1α and it has been considered to be the fourth HIF-P4H, the catalytic activity was only detected toward the 200-residue oxygen-dependent degradation domain (ODDD) of HIF1α, not toward shorter peptides harboring the prolines whose hydroxylated forms are recognized by the von Hippel-Lindau protein (13). Furthermore, P4H-TM also hydroxylated to a small extent HIF1α ODDD in which the HIF-P4H-targeted prolines were mutated to alanines. In addition, the behavioral phenotype of the P4H-TM KO mice is very different from any other HIF-P4H KO mice (20), and the symptoms of the patients with HIDEA syndrome have not been reported to be linked to HIF-P4H deficiency (21,22). P4H-TM contains a putative EF domain with calcium-binding EF-hand motifs not found in any other characterized 2OGDD superfamily enzyme. The significance of this domain for P4H-TM function, the potential connection of calcium storing/sensing to HIF1α regulation, and the relevance of P4H-TM localization inside the ER are not known. To shed light on the function of P4H-TM, we solved the crystal and solution structures of the protein, analyzed the structures, and used them to predict the effect of the HIDEA variants on the structure and function within the cellular context and in relation to the calcium concentration.

Results
Overall structure of human P4H-TM The structure of the soluble part of human P4H-TM was solved with X-ray crystallography in the presence of A, all P4Hs share the same reaction mechanism and cofactors. B, P4H-TM, HIF-P4H-2, the α-subunit of human C-P4H-I, and C. reinhardtii P4H (Cr-P4H) share the DSBH fold of the catalytic domain (CAT, red) and the catalytic residues (which are indicated), but they have unrelated N-terminal regions. P4H-TM has a cytosolic N-terminal region, a transmembrane helix, and an EF domain inserted into the catalytic domain. The DSBH core of the catalytic domain comes just after the EF domain, residues 310 to 460. The structure of the N-terminal region of HIF-P4H-2 is not known, but it is predicted to contain a MYND-type zinc finger (1). The N-terminal half of the α-subunit of human C-P4H is well characterized and contains a dimerization domain followed by a peptide-substrate-binding (PSB) domain (2). Structural information is missing for the C-P4H catalytic domain and for the linker region between the PSB and the catalytic domain. The functional C-P4H enzyme is an α 2 β 2 heterotetrameric complex between the catalytic α-subunit and the β-subunit/protein disulfide isomerase (not shown). Cr-P4H represents the simplest type of P4H that is lacking an extended N-terminus (3). C-P4H, collagen prolyl 4-hydroxylase; Cr-P4H, C. reinhardtii prolyl 4-hydroxylase; DSBH, double-stranded β-helix; P4H, prolyl 4-hydroxylase; P4H-TM, transmembrane prolyl 4-hydroxylase.
Fe 2+ , N-oxalylglycine (NOG), and Ca 2+ . The crystallized construct consists of residues 88 to 502 of P4H-TM and an Nterminal His-tag, whereas it lacks the short cytoplasmic region and the transmembrane helix (Fig. 1). The crystal structure shows that the P4H-TM fold is composed of two well-defined domains: the catalytic domain with the DSBH fold and the EF domain. The EF domain is inserted just before the core domain of the DSBH fold (Fig. 3, A and C). There are two P4H-TM molecules in the asymmetric unit. These two molecules are related by a noncrystallographic twofold axis (Fig. 3B). The PISA (Proteins, Interfaces, Structures and Assemblies) analyses identified four buried salt bridges: Arg318-Glu270, Arg402-Glu321, Glu270-Arg318, and Glu321-Arg402. There is a fifth and sixth possible (almost buried) salt bridge between Glu267-His276 and His276-Glu267. The latter two interactions are between the EF hands (residues 190-290). The protein-protein interface between the two molecules is extensive, 1400 Å 2 . This interface is rather polar, and PISA calculations using only the protein-protein interactions predict that these two molecules are not forming a stable dimer in the solution (the complexation significance score is zero). Overall, the P4H-TM structure contains 12 β-strands and nine α-helices. The conserved DSBH core is composed of eight antiparallel β-strands that are divided into two sheets that fold against each other. The core DSBH strands are conventionally numbered with roman numerals from I to VIII (Fig. 3A). Two of the core strands of the DSBH minor sheet are heavily disrupted in P4H-TM and do not observe the β-sheet geometry, but the nomenclature is preserved for clarity. The two P4H-TM molecules of the asymmetric unit occupy very similar conformations (RMSD 0.42 Å for all Cα-atoms), and only chain A is discussed unless otherwise stated. The alignment includes human P4H-TM (HsP4HTM), C. reinhardtii P4H (CrP4H), human C-P4H isoform I (HsCP4HI), Bacillus anthracis P4H (BaP4H), and the isoforms 1 and 2 of human HIF-P4Hs (HsHIFP4H1 and HsHIFP4H2, respectively). The presented sequence of P4H-TM starts with Asp88. The secondary structure elements indicated above the sequences are based on the crystal structure of the P4H-TM determined in this study. The β-strands that form the DSBH core are labeled with Roman numerals. The conserved residues are highlighted, and the residues involved in iron chelation (Fe 2+ : His328, Asp330, and His441), 2OG binding (2OG: Tyr319, Thr375, and Lys451), and P4H activity (CAT: Arg273, Trp279, Tyr325, Arg358, and Trp457) are labeled. Disulfides (Ds) and glycosylated (N-acetyl glucosamine [NAG]) residues are also labeled. A distinct feature in P4H-TM compared with the other P4Hs is the EF domain between residues Gln190 and Asn251 that contains two EF hands (EF1 and EF2). CAT, catalytic domain; 2OG, 2-oxoglutarate; DSBH, double-stranded β-helix; P4H, prolyl 4-hydroxylase; P4H-TM, transmembrane prolyl 4-hydroxylase.
The first visible P4H-TM residues are Thr107 and Leu108 in chains A and B, respectively. Including the 6xHis tag, there are 25 disordered residues in the N-terminus that are not visible in the electron density. The visible N-terminus is positioned along the protein surface between helices α7 and α8 and the βVI-βVII loop without a defined secondary structure. Starting at Gly120, the protein forms three consecutive antiparallel β-strands (β1-3) that extend the DSBH major sheet off of βVI, with β3 running antiparallel to βVI. After β3, the first helix α1 is positioned behind the major sheet. α1 is followed by a long and partially disordered loop α1-α2 that reaches the EF domain. The first of the two EF hands is formed by α2, a calcium-binding loop α2-α3, and α3. α3 ends abruptly after six residues and is followed by a 3 10 helix and the α3-α4 loop between the two EF hands. The second EF hand is formed by helices α4 and α5 and the calcium-binding loop α4-α5 in between them. The six-residue α5 is followed, after a turn, by the longer α6. The loop α6-β4 runs antiparallel to the α1-α2 loop back toward the catalytic domain and forms a two-residue β-like extension to the DSBH minor sheet. β4 extends the . P4H-TM structure. A, the P4H-TM structure presented as a secondary structure topology chart where the EF domain is shown in green, the DSBH core in red, and the rest of the protein in cyan. B, presentation of the asymmetric unit dimer viewed down the dimer two-fold axis, toward the peptide binding groove. Chain A is shown in pink and chain B in light blue. Active site residues are shown as stick models and Fe 2+ (orange) and Ca 2+ (green) ions are shown as spheres. C, stereo presentation of the monomeric P4H-TM as a rainbow-colored cartoon model. Active site residues, EF-hand residues, and glycans are shown as stick models and Fe 2+ (orange) and Ca 2+ (green) ions are shown as spheres. DSBH, double-stranded β-helix; P4H-TM, transmembrane prolyl 4-hydroxylase.
P4H-TM structure DSBH major sheet to the opposite direction to the strands 1 to 3. It is followed by two helices, α7 and short α8, that run antiparallel to each other and lead into the first β strand of the DSBH core. βI is located between β4 and βVIII in the DSBH major sheet. βII folds over βI, but its conformation is disrupted by the iron-binding residues and the disulfide in the neighboring βVII strand. βII leads to an extended 30-residue loop βII-βIII, which contains an internal disulfide between Cys340 and Cys357. The disulfide bridge appears to tether the middle part of the loop to the beginning of βIII. βIII is part of the major sheet between βVI and βVIII. Subsequently, βIV folds over it, positioned between βV and βVII, and leads to a 30residue βIV-βV loop that harbors two 3 10 helices and forms several contacts with the EF domain. A disulfide is formed between Cys404 near the end of the βIV-βV loop and Cys444 at the end of βVII. This linkage seems to function as an anchor to determine the position of this large loop. After the loop, βV forms one end of the minor sheet and is positioned antiparallel to the neighboring βIV. A short turn after βV leads to βVI positioned between β3 and βIII. The following loop, βVI-βVII, twists around itself in an extended hairpin-like structure and leads to βVII after a short 3 10 helix. βVII completes the minor sheet between βII and βIV and is followed by the final major sheet strand βVIII. A short, coiled region after βVIII leads to the terminal helix α9. The last visible residue in the structure is Gly481 at the end of α9, only 25 Å away from the visible N-terminus. The last 21 residues of P4H-TM are not visible in the electron density.
Prediction of glycosylation sites within P4H-TM resulted in three potential sites of which two, Asn368 and Asn382, reached the threshold, whereas the third site, Asn348, was just below it. In the structure, Asn368 and Asn382 were found to be glycosylated, whereas Asn348 is located in a part of the βII-βIII loop where the electron density for the side chain was not well defined. Asn368 is in the short βIII-βIV loop and two Nacetylglucosamine residues could be modeled to be attached to it with clear electron density. Eight mannose residues were modeled to the Asn382-linked glycan in the βIV-βV in addition to the two N-acetylglucosamines.
A suggested alternative isoform 3 of P4H-TM (UniProt: Q9NXG6-3) to isoform 1 (UniProt: Q9NXG6-1) contains a 61-residue insertion after Arg358 and appears in some sequence databases as the canonical form of the enzyme (Fig. S1). We expressed isoform 3 with a similar insect cell expression vector used for the isoform 1 lacking the cytosolic part and the transmembrane domain but did not obtain soluble protein for characterization (data not shown). The P4H-TM structure indicates that an insertion at this position would either displace Arg358 and triple the length of the βII-βIII loop or alternatively completely displace the βIII and βIV strands.

EF hand and Ca 2+ binding
The P4H-TM structure contains an EF domain with two EF hands between residues 190 and 251 (Fig. 4A). Calcium is coordinated in a pentagonal bipyramid conformation by seven oxygen ligands from residues listed in Table 1. The Ca 2+coordinating residues are notated based on their geometrical position: X and -X form the tips of the bipyramid and Y, -Y, Z, and the bidentate -Z occupy the vertices of the pentagonal plane. The initial Ca 2+ -binding residue Asp198 in the first EF hand emerges directly from α2. The -Y position of the first EF hand is provided by the main chain carbonyl of His204 while Glu209 links α3 to the Ca 2+ with bidentate binding. Unusually, the initial Ca 2+ -binding residue of the second EF hand, Asp237, does not emerge from α4 directly, but after a linker residue. This displacement could influence the relative movement by α4 and α5 in the event of Ca 2+ dissociation. The coordinating oxygen in the -X position of the second EF hand is provided by a water molecule in chain A. No electron density for a corresponding water molecule was visible in chain B. The main chain carbonyl oxygen of Val243 provides ligand position -Y, and Glu248 from α5 binds Ca 2+ in a bidentate manner.
The overall Ca 2+ -binding affinity of P4H-TM was measured using isothermal titration calorimetry (ITC), and the corresponding K d was 23 μM (Table 2) (Fig. 4B). The modeling used assumes that the two binding sites are identical and independent. The obtained data fit well to this model, and the stoichiometry is accurate. The results suggest that calcium binding to P4H-TM is enthalpy-driven as the negative ΔH term is nearly nine times larger than the slightly positive entropy term -TΔS (Table 2).
CD measurements were used to clarify the structural changes that occur during calcium binding to P4H-TM. Far-UV synchrotron radiation CD measurements did not show a notable change in the secondary structure composition of P4H-TM with the addition of calcium (Fig. 4C). In contrast, the near-UV CD revealed a large shift in the CD signal in the tryptophan region between wavelengths 285 nm and 305 nm when calcium was added to metal-free P4H-TM (Fig. 4D). No shift was induced when magnesium was added to metal-free P4H-TM (Fig. 4D). This signal probably arises when the calcium-dependent conformational change shifts the positions of Trp220 and Trp221 located in the α4-α5 loop between the two EF hands (Fig. 4A). Trp220 interacts with the flexible α1-α2 loop and forms CH-π interaction with Pro175. Trp221 interacts with the βIV-βV loop and is stacked with His326.

Active site
The active sites of the 2OGDD enzymes are remarkable in that the binding of iron and 2OG is highly conserved, but there is a large variation in the binding modes of the hydroxylatable substrates. In P4H-TM, the EF domain extends the peptide binding groove. As the catalytic site with bound iron and the 2OG analogue NOG is located at the catalytic domain side of this groove, this is most likely the site where the P4H-TM substrate will bind. The active site iron is coordinated by His328, Asp330, His441, and the oxygen atoms from C-1 carboxylate and C-2 carbonyl of NOG (Figs. 5 and S2). NOG C-1 carboxylate group is shifted above the plane formed by Asp330, His441, and the NOG C-2 carbonyl group that is P4H-TM structure unlike in a typical 2OGDD active site where the ironcoordinating residues follow octahedral geometry. In addition, there is no water molecule positioned trans to His328.
NOG is coordinated at the C-5 carboxyl group by Tyr319, Thr375, and Lys451 and at the C-1 carboxyl by Asn455 (Fig. S2). The interaction between NOG and Asn455 may P4H-TM structure contribute to the disrupted iron coordination geometry. The P4H-TM iron-binding residues His328, Asp330, and His441 are conserved in other P4Hs (Fig. 2). Of the NOG-binding residues, only Tyr319 is fully conserved (Fig. 2). Lys451 and Thr375 are conserved, except in HIF-P4Hs where the lysine is replaced by an arginine and the threonine with a leucine (Fig. 2). Furthermore, in HIF-P4Hs, the conserved tyrosine corresponding to Tyr365 in P4H-TM interacts with the cosubstrate, whereas in P4H-TM, Tyr365 forms a hydrogen bond to the cosubstrate interacting Thr375. Asn455, although conserved in C-P4H-I, is replaced in most P4Hs by a threonine that does not form similar interaction with the cosubstrate (Fig. 2). NOG-binding in P4H-TM is also altered, compared with homologs, by Gly443, which in homologs usually has a side chain that restricts the 2OG/NOG binding site (Fig. 2). Iron and 2OG/NOG coordinating residues are well conserved in P4H-TM orthologs (Fig. S3).
The residues Arg273, Trp279, Glu312, Tyr325, Arg358, and Trp457 near the P4H-TM active site are conserved in other P4Hs and function in substrate binding or catalysis. The function of these residues can be predicted based on the substrate-peptide containing P4H structures of Chlamydomonas reinhardtii P4H (Cr-P4H) and HIF-P4H-2 (Fig. 5). In these homolog structures, Arg273 directly interacts with the first residue of the lid structure and with the substrate peptide via a water molecule. Trp279 forms stacking interaction with a peptide bond of the substrate peptide in the Cr-P4H structure. Glu312 is not conserved in HIF-P4H-2 but interacts with the substrate peptide and Arg358 in the Cr-P4H structure. Tyr325 is stacked between Arg273 and His328, forms a hydrogen bond to the backbone carbonyl of the substrate peptide residue at position -2 to the hydroxylated proline and is in intimate proximity of the hydroxylated proline. Arg358 directly interacts with the backbone carbonyls of the hydroxylated proline and Asp330. Trp457 forms a stacking interaction with Arg358 and is hydrogen bonded with the carboxylate group of Asp330. These residues are also completely conserved in P4H-TM orthologs (Fig. S3). The presence of conserved residues linked to P4H activity indicates that the central aspects of P4H function and substrate binding are conserved in P4H-TM.
Four loop structures surround the P4H-TM active site (Fig. 6A). The loop α1-α2 leads from the catalytic domain to the EF domain next to the active site. The βII-βIII loop with an internal disulfide borders the active site cavity on one side. The βIV-βV loop forms interactions with the EF domain, and together with the loop α3-α4 from the EF domain makes the substrate-binding cavity of P4H-TM longer and narrower compared with homologous enzymes. The βVI-βVII loop extends the length of the cavity next to βII-βIII and opposite βIV-βV. The sequences of the P4H-TM loops βII-βIII, βIV-βV, and βVI-βVII are almost completely conserved among vertebrates, and any substrate-interacting residues located there are likely to be preserved (Fig. S3). All loops are present also in Cr-P4H, but they are shorter and the sequence conservation to P4H-TM is very limited. In addition, the βII-βIII loop occupies the active site in the Cr-P4H structure with zinc and pyridine 2,4-dicarboxylate but without the peptide substrate ( Fig. 6C) (3). In HIF-P4H-2, these loops are practically absent, and the position of βII-βIII is occupied by the C-terminal helix that also interacts with the substrate peptide. The cysteines 340 and 357 that form a disulfide in the βII-βIII loop are not conserved elsewhere (Fig. 2). On the other hand, Cys444 is conserved in Cr-P4H and C-P4H-I, but Cys404 is conserved only in Cr-P4H, suggesting that the corresponding disulfide, if present, is formed differently in C-P4H-I (Fig. 2). The disulfide-forming cysteines are completely conserved in P4H-TM orthologs, but some invertebrate proteins have additional cysteines in the βII-βIII loop (Fig. S3).
Electrostatic surface calculation shows that the substrate binding groove is lined with negatively charged residues resulting in an overall negative charge concentrated on two positions (Fig. 6B). The first is at the opening of the cavity where the most prominent acidic residues are Asp386 and Glu387 from βIV-βV loop and Asp434 and Asp437 from βVI-βVII loop. The second is at the other end of the cavity and is composed of glutamates 177, 178, 180, and 181 from the α1-α2 loop. The corresponding residues in Cr-P4H and HIF-P4H-2 are part of a lid structure on top of the substrate peptide (Fig. 6, C-D), suggesting that these glutamates may be involved in a similar role. The acidic residues are nearly always conserved in vertebrate P4H-TM sequences as either aspartate or glutamate residues (Fig. S3). In comparison, Cr-P4H active site also contains negatively charged residues but they are much less prominent than in P4H-TM (Fig. S4). On the other hand, in HIF-P4H-2, the HIFαbinding site is largely positively charged (Fig. S4). The differences in the P4H-TM active site compared with homologs outside the immediate core suggest a different substrate peptide or a different substrate binding mode than in the other enzymes.
Cr-P4H and HIF-P4H-2 peptide-bound structures have a lid structure folded over the substrate peptide (Fig. 6, C-D). In peptide-free structures, this region is either disordered or folded away from the active site (Fig. 6, C-D). In P4H-TM, this Table 1 EF-domain Ca 2+ -coordinating residues of P4H-TM P4H-TM, transmembrane prolyl 4-hydroxylase. P4H-TM structure lid structure seems to be conserved and is formed by the partially disordered α1-α2 loop.
Solution structure of P4H-TM and the impact of calcium The calcium-bound P4H-TM crystal structure does not adequately clarify the role of Ca 2+ in P4H-TM function. However, P4H-TM did not crystallize without Ca 2+ . To overcome this, the conformation change caused by calcium binding was modeled using the structure of calmodulin Nterminal EF-hand pair without calcium. The Ca 2+ loss in calmodulin leads to a relative shift in the positions of the EFhand helices. P4H-TM without calcium was modeled in three ways with the EF domain helices fixed in place at three different positions (Fig. 7, A-C). The model where α3 was fixed resulted in the decrease of the overall length of the protein as the result of Ca 2+ loss (Fig. 7B). This alternative would mostly conserve the substrate-binding cavity and the interactions between the EF domain and the βIV-βV loop. In the two other models, where either α2 position was fixed (Fig. 7A), or α5 was fixed to extend α6 (Fig. 7C), the whole EF domain moved away from the catalytic domain and increased both the size of the substrate-binding cavity and the overall length of the protein. These models would likely disrupt the interactions between the EF domain and the βIV-βV loop, unless the loop is capable of substantial elongation.
To determine the difference in solution structure between Ca 2+ -bound and Ca 2+ -unbound P4H-TM, size-exclusion chromatography (SEC)-small-angle X-ray scattering (SAXS) was measured in the presence and absence of Ca 2+ . P4H-TM eluted as a single peak in both conditions (Fig. S5). Overall, the SAXS data obtained in the two conditions were not very different from each other (Fig. 7D), and there was no substantial difference in the degree of folding in the two conditions as evidenced by the Kratky plot (Fig. 7E). However, both the radius of gyration and the maximum dimension were larger for the sample without Ca 2+ (Table 3, Fig. 7F). This result agrees with the models in Figs. 7, A and C, where P4H-TM adopts a slightly extended conformation in the absence of calcium.
P4H-TM molecular weight calculated from the amino acid sequence is 48.3 kDa. The molecular weight estimates obtained from the SAXS data indicate that the protein eluted from the column as a dimer (Table 3). Ab initio modeling of the SAXS data with P2 symmetry produced extended envelopes with a central bulge (Figs. 7, H and I and S6). The envelope of the protein in the absence of calcium is slightly P4H-TM structure longer corresponding to the larger D max . Superposition of the P4H-TM crystal structure with the SAXS envelopes suggests that the two molecules are positioned side by side at the central bulge and that the extended ends are formed by the Nterminus, the terminal α9 helix, and the C-terminus. It seems likely that loss of calcium would cause a shift in the position of the dimerization interface resulting in a more elongated molecule. Such a shift could result from the movement of the EF-hand helices, as shown for example in Fig. 7C, but for a more detailed analysis, a higher resolution structure without calcium should be obtained.
To further study the oligomerization state, P4H-TM was analyzed by SEC multiangle light scattering (MALS) with and without Ca 2+ . Both samples contained a small amount of aggregated protein (Fig. S7). The molecular weight of the soluble protein fraction varied from 56 to 66 kDa without calcium to 60 to 70 kDa with calcium. These values are higher than the corresponding theoretical value of 48.3 kDa for the monomeric protein calculated from the amino acid sequence and might reflect the apparent concentration-dependent increase in P4H-TM SEC elution volume observed during purification (Fig. S8). In any case, the protein in the SEC-MALS assay eluted always as a single peak corresponding to a mass that is higher than the monomer mass. It can be noted also that the peak is asymmetric, suggesting that possibly in the solution there exists a fast equilibrium between the monomer and dimer.
Mapping of the HIDEA variants to the crystal structure of P4H-TM Five P4H-TM variants have been linked to a severe developmental HIDEA syndrome (Table 4) (21,22). When modeled on the P4H-TM crystal structure solved here, three of the variants clearly destroy the function of the enzyme as they lead

P4H-TM structure
to nearly complete loss of the whole protein or large fragments of the catalytic domain because of frameshift mutations leading to early stop codons and a missense mutation leading to exon 6 skipping (Table 4). His161Pro introduces a proline residue in the place of a surface-facing histidine in the middle of α1 (Fig. 8). As prolines are unable to conform to α-helical geometry, this variant would produce a kink in α1 and probably disrupt its interaction with the DSBH major sheet next to it. A missense mutation causing a stop codon replacing Gln471 leads to the truncation of the protein by 32 residues and cuts off the latter half of α9 (Fig. 8). Although not visible in the crystal structure, the C-terminus of P4H-TM contains an ER retention signal, and its loss might cause P4H-TM to advance beyond the ER in the secretory pathway.

Discussion
P4H-TM is a functionally enigmatic enzyme that is localized at the ER membrane (12,13). It is composed of an Nterminal cytoplasmic tail, a membrane-anchoring transmembrane helix, and a unique combination of a Ca 2+ -binding EF domain and a catalytic domain that is located within the ER lumen. The crystal structure reported here reveals the structure of the soluble part of P4H-TM without the cytoplasmic tail and the transmembrane helix. P4H-TM belongs to the 2OGDD family characterized by the DSBH structural fold. The helix-loop-helix structures of the EF domain are inserted into the middle of the catalytic P4H domain but are structurally distinct from it. In between the catalytic domain and the EF domain forms the substrate binding cavity of the enzyme. The cavity is located above the catalytic center that contains Fe 2+ and the 2OG analogue NOG. The 2OGDD protein family includes single-domain and large multidomain proteins that are highly variable in their substrate specificity, but P4H-TM is the only enzyme in this family that has EFhand motifs (23).
Previous SEC studies proposed that both the full-length P4H-TM and the construct used here (residues 88-502) are dimers with molecular weights around 105 to 120 kDa and around 85 to 90 kDa, respectively (13). The present study suggests that both monomeric and dimeric forms are possible. Two P4H-TM copies were found in the asymmetric unit in the P4H-TM crystals. PISA server analysis suggested that the interaction interface of the two units is extensive but less hydrophobic than expected for interaction, the complexation significance score suggesting that the two units do not form a stable dimer in solution. This implies that the dimer formation could be a crystal packing artifact. The molecular weight from the SEC-MALS analysis was much closer to the weight of a monomeric than of a dimeric protein. In contrast, the solution structure determined with SEC-SAXS resembled the crystallographic dimer and the molecular weight estimated from the SEC-SAXS data indicated a dimer. However, the molecular weight of the SEC elution peak was not homogenous in the SEC-SAXS experiment. The input concentration of P4H-TM was higher in the SEC-SAXS experiment than in the SEC-MALS experiment, and the chromatography column was of a smaller volume in SEC-SAXS than in SEC-MALS, resulting in increased dilution in the latter. This suggests that the oligomerization could be concentration dependent, and this was supported by the concentration-dependent shifting of the SEC elution peak observed during purification, the higher concentration favoring oligomerization. Another possible explanation for the discrepancy is that the SEC-SAXS sample was a mixture of monomers and dimers and the column resolution was not adequate to separate these two states. In the SEC-MALS experiments, a smaller peak was observed eluting before the main peak, possibly corresponding to the dimeric protein.
The presence or absence of calcium made no difference to the oligomerization state.
Based on these results, it is not possible to unequivocally conclude the oligomerization state of P4H-TM. Furthermore, there may be additional interaction sites within the  P4H-TM structure transmembrane or cytoplasmic regions and P4H-TM may also exist as a transient dimer, as shown for some proteins such as the zebrafish SCP-2 thiolase (24). The monomer-dimer exchange could be important for the function of P4H-TM. In the crystallized dimer, the N-terminus and C-terminus are on the same side as the active site, whereas the carbohydrate moieties are on the opposite side of the dimer. The N-terminal domain is providing the membrane anchor, whereas the C-terminus has the ER-retention signal. This topology suggests that the active site faces the membrane, as schematically visualized in Figure 9. The N-linked glycans are located on the opposite side of the catalytic domain (pointing to the ER lumen), indicating that especially in the dimer form they will not take part in substrate binding (Fig. 9). The binding of calcium did not affect the oligomerization; however, it may regulate the conformation of the active site because calcium binding resulted in a more compact form. Finally, the SEC-SAXS model of the P4H-TM dimer suggested the enzyme to form an extended shape that would be attached to the membrane from both ends, and where both active sites of the dimer would be in a similar orientation adjacent to the ER membrane (Fig. 9). Such a model highlights the fact that very little is currently known about the role and possible interaction partners of the cytoplasmic N-terminal domain of P4H-TM.
P4H-TM variants have been found to cause the HIDEA syndrome. Its symptoms include hypotonia, severe intellectual disability, epilepsy, and eye abnormalities (21,22). Some of the reported variants will clearly lead to P4H-TM loss of function as they result in the loss of large fragments of the enzyme, including the active site. For the variants His161Pro and Gln471*, previous analysis indicated decreased protein solubility (22). The structural analysis revealed that His161 is located within the α1 helix in the P4H-TM structure. Introduction of a proline within the helix will produce a bend to the helix. α1 forms conserved hydrophobic interactions with the major sheet of the DSBH fold and the α7 helix, and a conserved salt bridge between helices α1 and α7. Therefore, disruption of α1 probably leads to the destabilization of the enzyme. The early stop codon replacing Gln471 removes a part of the C-terminal helix α9 and any subsequent residues that were not visible in the crystal structure. The sequence of the terminal helix and residues following Gln471 are not strongly conserved either in the P4H-TM homologs or orthologs, suggesting the critical effect of this variant may be the loss of the ER retention signal at the C-terminus of the protein.
In addition to the 502-residue isoform reported initially (12,13), some databases now include additional isoforms of P4H-TM that would be derived from alternative splicing. A 563residue isoform 3, resulting from missplicing of exons 6 and 7, has been suggested to be the "canonical" isoform and has been used as the template for the antibody epitope for P4H-TM in Human Protein Atlas (http://www.proteinatlas.org/) (25). However, there is no published evidence of this isoform appearing in the protein form and there are no peptides listed in PeptideAtlas (26) that correspond to the 61 residues unique for this isoform. This 563-residue isoform was previously found not to be expressed in human fibroblasts or myoblasts (22). Furthermore, we were unable here to express and purify this isoform using the insect cell expression system, and the interpretation of the P4H-TM crystal structure suggests that the 563-residue isoform would not preserve the conserved structural core of the enzyme. Together these data indicate that the 563-residue isoform is likely to be a splicing artifact that is not translated into a functional enzyme.
Two groups of P4Hs are found in animals. C-P4Hs hydroxylate specific prolines in procollagen chains and enable the formation of the stable triple-helical structure (5,9). HIF-P4Hs hydroxylate two prolines in HIFα proteins, leading to their proteasomal degradation (8,10). P4H-TM has characteristics of both groups. It is localized to ER like C-P4Hs and its amino acid sequence is more similar to C-P4Hs than HIF-P4Hs (12,13). On the other hand, it does not hydroxylate prolines in collagen or HIFα peptides but has some activity toward the ODDD of HIF1α (13), and it contributes to HIF1α degradation and regulation of erythropoietin (19). However, it has not been thoroughly clarified what exactly is the role of P4H-TM in HIF regulation. In addition to animals, P4Hs are found in plants, algae, bacteria, and viruses (27). A handful of residues in the P4H active site are conserved in all or nearly all of these enzymes. Although some of the conserved residues P4H-TM structure function to preserve the conserved structural fold, several of them are linked to the P4H catalytic activity. P4H-TM active site resembles the classical active site of 2OGDD family enzymes (28). It contains a divalent iron, central to the catalytic activity, coordinated by two histidines and an aspartate residue and the 2OG analogue NOG. Although a precise enzymatic mechanism for any P4H has not been described, the similarities between the P4H-TM active site and the active sites of Cr-P4H and HIF-P4H-2 described here extend beyond the iron and 2OG coordinating residues to the residues that interact with the substrate peptide and the proline to be hydroxylated. This confirms that P4H-TM is indeed a P4Hs, most likely with a peptide substrate.
P4H-TM structure revealed an extensive substrate binding cavity between the EF domain and the catalytic domain. The cavity is bordered by loop structures extending from the DSBH core of the catalytic domain. The sequences of these loops are not conserved in the homologous P4Hs but are strongly conserved among the P4H-TM orthologs. Therefore, these loops are likely to participate in P4H-TM substrate binding or activity regulation. Cr-P4H and HIF-P4H-2 structures have been described both in the presence and absence of the peptide substrates and both enzymes have similar lid structure folding over the substrate peptide (3,(29)(30)(31). In this P4H-TM structure, the α1-α2 loop is partially disordered and positioned like a lid in an open conformation. It seems likely to be capable to form a lid structure over a substrate peptide when one is bound. In P4H-TM homolog structures, the lid residues interact with the substrate peptide, and in HIF-P4H-2, they are also known to contribute to the substrate specificity (29,31). Interestingly, the residues of the lid structures are not conserved between P4H-TM and the other P4Hs. The analysis of the electrostatic surface near the P4H-TM active site, and its comparison with HIF-P4H-2, revealed that the P4H-TM active site contains abundant negative charge, whereas HIF-P4H-2 contains mainly positive charge. These data, together with the previous results showing that P4H-TM does not hydroxylate HIF-P4H substrate peptides, suggest that P4H-TM has evolved to bind a different substrate than HIF-P4H-2 (13).
The cell stores calcium in mitochondria and ER. The Ca 2+ concentration within the ER lumen has been measured to be between 100 and 800 μM, whereas concentrations as low as 1 μM have been reported in human cells during Ca 2+ mobilization (32, 33). Ca 2+ K d for P4H-TM was measured here to be 23 μM, indicating that P4H-TM would be saturated with calcium in physiological situation, but that this saturation could be sensitive to changes, such as when calcium is temporarily released from the ER to the cytosol. The EF hand containing proteins often regulate enzyme activity in response to changes in cellular calcium concentration. However, the role of the EF domain in P4H-TM is not currently fully understood. In a previous study, 5-mM CaCl 2 was included in the activity assays where P4H-TM was found to be inactive toward HIF1α or collagen peptides and active toward the HIF1α ODDD (13). P4H-TM seems to be able to adopt both monomeric and dimeric forms in the solution, but calcium had no effect on the oligomerization status. Some EF-hand motifs adopt an unstructured molten globule conformation in the absence of calcium and only form regular secondary structure when calcium is bound (15). However, synchrotron radiation CD measurements did not find any major shift in the secondary structure of P4H-TM when calcium was added. On the other hand, a near-UV CD measurement produced a major shift in the tryptophan region at 285 to 305 nm, indicating a movement induced by the EFhand calcium binding in the position of some of the tryptophans in the enzyme. This movement is likely to arise from the two tryptophans found in the α3-α4 loop between the two EF hands that is expected to shift the position when the enzyme adopts the calcium-bound conformation seen in the crystal structure. The α3-α4 tryptophans are located near the active site cavity, and a conformation change in this region in response to calcium binding could indicate changes in the active site relevant to enzymatic activity. Some EF-hand motifs also bind magnesium (15). We did not see a shift in Figure 9. Modeling of the effect of dimerization to the P4H-TM structure. Full-length monomeric P4H-TM is only tethered by the transmembrane helix and its orientation regarding the ER membrane is variable. On the other hand, the P4H-TM dimer has two transmembrane anchors, and its orientation is fixed with both substrate-binding cavities facing the membrane oriented so that the β4 strand side of the major sheet is found closest to the membrane. In this model, the C-terminal helix with the ER retention signal is also pointing to the membrane. ER, endoplasmic reticulum; P4H-TM, transmembrane prolyl 4-hydroxylase.
P4H-TM structure the near-UV CD signal when magnesium was added and we were unable to crystallize P4H-TM when calcium in the crystallization condition was replaced with magnesium, suggesting that P4H-TM does not bind magnesium. Because P4H-TM could not be crystallized in the absence of calcium, the effect of the loss of calcium on the P4H-TM structure was modeled based on the structure of the apo form of calmodulin. Three different models were generated where either (1) the initial α2 helix or (2) the subsequent α3 helix was fixed in place, or where (3) the α5 helix was modeled to extend the α6 helix. Two of these models led to an extended overall structure for P4H-TM where the EF domain was seen to move away from the catalytic domain, causing widening of the active site cavity. Such a model would likely have an impact on the catalytic activity of P4H-TM. The SEC-SAXS results indicated that P4H-TM without calcium adopts a slightly extended conformation where both the radius of gyration and the maximum dimension are slightly larger than those of the calcium-bound P4H-TM. However, because P4H-TM in the SAXS data was found to be a dimer, it is not clear if the extended conformation is caused by the elongation of both monomers or the reorganization of the dimer interface.
In conclusion, the solved 3D structure of P4H-TM indicates that it shares the key structural elements of the known P4Hs, confirming that it is a true P4H while possessing a unique property among the 2OGDDs having an EF domain and a catalytic activity potentially regulated by Ca 2+ .

Cloning, expression, and purification
The cloning of the P4H-TM construct has been described previously (13). Briefly, the construct contains 17 N-terminal residues (MLRRALLCLAVAALVRA) from the ER localization signal of protein disulfide isomerase that are cleaved upon import to the ER, six histidine residues and the residues 88 to 502 of human P4HTM within the pVL1392 expression vector. To improve the expression yield, this construct was subcloned into pFastBac dual vector (Invitrogen) and the Bac-to-Bac protocol was used with the EmBacY Escherichia coli strain (34) to generate P4H-TM bacmids. P4H-TM isoform 3 (MGC clone 3940241) was obtained from the Genome Biology Unit at the University of Helsinki, Finland. Isoform 3 DNA sequence was amplified with PCR; the amplified DNA and the P4H-TM isoform 1 plasmid were both digested with BlpI (NEB) and XbaI (NEB) restriction enzymes, and the isoform 1 sequence was replaced with isoform 3. Bacmids were transfected to Sf9 cells using baculoFECTIN II transfection reagent (Oxford Expression Technologies). Resulting viruses were used to infect Sf21 expression cultures in insect-Xpress media (Lonza). Expression culture cells were harvested 48 h after proliferation arrest, washed with PBS, and frozen in −80 C.
Frozen cells were thawed and resuspended in the lysis buffer containing 10-mM Tris HCl, pH 7.8, 0.1 M glycine, 0.1 M NaCl, 20-mM imidazole, 2-mM CaCl 2 , 20-μM FeSO 4 , 0.1% Triton X-100, and 1x protease inhibitor cocktail (Roche). The cell suspension was homogenized, the insoluble fraction was pelleted by centrifugation, and the soluble fraction was applied to a His-trap Ni 2+ -affinity column or a gravity-flow Ni 2+ -NTA column. The column was washed with 10-mM Tris HCl, pH 7.8, 0.1 M glycine, 0.1 M NaCl, 20-mM imidazole, 2-mM CaCl 2 , and 20-μM FeSO 4 , and the bound proteins were eluted with a similar buffer with 0.3 M imidazole. The eluted fractions were analyzed with SDS-PAGE, and the P4H-TM-containing fractions were pooled, concentrated, and purified with SEC using 10-mM Tris HCl, pH 7.8, 0.1 M glycine, 0.1 M NaCl, 2-mM CaCl 2 , 20-μM FeSO 4 as the eluate. The protein samples used to analyze Ca 2+ interaction were purified in the same way, but without CaCl 2 or FeSO 4 , and 1-mM EDTA was added after elution from the Ni 2+ -affinity column.

MALS
Molecular mass and sample quality of P4H-TM with and without Ca 2+ were analyzed with a miniDAWN MALS device (Wyatt Technology Corporation) connected to a Shimadzu HPLC unit (Shimadzu Corporation) with a Superdex 200 Increase 10/300 GL SEC column (GE Healthcare Life Sciences) at constant 10 C temperature and equilibrated with the SEC buffer with and without 2-mM CaCl 2 . The sample concentration was 3.3 mg/ml, and the flow rate 0.5 ml/min. The RID-10A refractive index detector (Shimadzu Corporation) connected to the HPLC system was used as a concentration source for the calculations. ASTRA software (version 7.3.1.) (Wyatt Technology Corporation) was used to calculate the molecular weight and polydispersity of the samples.

CD spectroscopy
Synchrotron radiation CD spectra were collected from 0.3 mg/ml samples at AU-CD beamline at ASTRID2 synchrotron source (ISA) The samples were prepared to a buffer with 1-mM Tris HCl, pH 7.8, 10-mM NaCl, and 10-mM glycine. Two millimolar of CaCl 2 was added right before the measurement. The samples were equilibrated to RT and applied into 0.1-mm pathlength closed quartz cuvettes (Suprasil, Hellma Analytics). The spectra were recorded from 170 nm to 280 nm, at 25 C. Three repeat scans per measurement were recorded. The spectra were processed, and the baselines were subtracted using CDToolX (35).
Near-UV CD spectra were collected using a Chirascan CD spectrometer (Applied Photophysics) between 250 and 350 nm at RT using a 1-cm pathlength quartz cuvette. The CD measurements were acquired every 1 nm with 1 s as an integration time and repeated three times with baseline correction. For the near-UV measurement, P4H-TM was diluted so that the absorbance at 280 nm was 1. The samples were measured so that the protein in the SEC buffer without metal was measured first, after which CaCl 2 or MgCl 2 was added to the cuvette to 2-mM concentration, and the sample was measured again. The data were analyzed with Pro-Data Viewer (Applied Photophysics).

P4H-TM structure
Crystallization and data collection P4H-TM crystals were grown using the sitting-drop vapordiffusion method. The drops (200-nl protein solution and 100nl well solution) were made with the Mosquito nanodispenser (TTP Labtech) and imaged using the Formulatrix RI27 plate hotel at 4 C at the Structural Biology Core Facility at Biocenter Oulu. The crystallization results were monitored using the in-house IceBear software (Daniel et al., manuscript in preparation). The protein concentration was 3 mg/ml, and the buffer was the same used for the SEC analyses including 2-mM CaCl 2 and 20-μM FeSO 4 . The well solution was 0.1 M Tris HCl, pH 9, 22% tert-butanol, and 1-mM NOG (Sigma). The crystals were soaked briefly in a solution containing 0.1 M Tris HCl, pH 9, 5% tert-butanol, 20% 2-methyl-2,4-pentanediol, and 1-mM NOG, before flash-freezing in liquid nitrogen. Diffraction data were collected at the beamline P13 operated by the European Molecular Biology Laboratory (EMBL) Hamburg at the PETRA III storage ring (DESY) (36). Crystals suffered radiation damage during data collection and a minimal number of images (collected at the beginning of the exposure time) that produced a complete data set were used in the final data processing calculations.

Data processing and structure refinement
Data were processed in XDS (37). Molecular replacement was performed with Phaser using a single molecule of Cr-P4H (PDB ID: 2jig), modified with Phenix.sculptor as a search model (3,38,39). The correct solution with two molecules in the asymmetric unit had log-likelihood gain of 285 and translation function Z-score equivalent of 18.2. The model was built initially using Phenix.autobuild followed by several cycles of manual building in COOT and structure refining using Phenix.refine (40)(41)(42). The resolution cutoff 2.25 Å was determined using paired refinement within PDB-redo server (43,44). For comparison, the mean I/sigma(I) reached 2.0 in the resolution shell between 2.79 and 2.69 Å, and the CC½ for this shell was 0.803. The structure was validated using Mol-Probity and PDB validation server (45,46). The glycan conformations were validated using the pdb-care server (47). Data processing and refinement statistics are shown in Table 5.

ITC
P4H-TM and Ca 2+ interaction was studied using ITC at the Proteomics and Protein Analysis Core Facility at Biocenter Oulu. Ca 2+ -free P4H-TM was exchanged with SEC to a buffer containing 10-mM Tris HCl, pH 7.8, 0.1 M glycine and 0.1 M NaCl. CaCl 2 was dissolved in the same buffer, diluted to 4 mM, and injected to 120-μM P4H-TM at 25 C using ITC200 instrument (MicroCal). The binding was analyzed with MicroCal Origin using 'one set of sites' binding model.

SAXS
SAXS measurements were performed at the P12 beamline at PETRA III in Hamburg (59). P4H-TM purified in the absence of calcium was passed through Superdex 200 Increase 5/150 P4H-TM structure column with 0.2 ml/min flow rate at RT. Scattering was measured from the eluted buffer and protein sections. Identical samples with 7.4 mg/ml concentration were applied to the system in two buffers which both contained 10-mM Tris HCl, pH 7.8, 0.1 M NaCl, 0.1 M glycine, and 1% (w/v) glycerol, and one of which also contained 2-mM CaCl 2 . The protein and buffer frames were selected for processing using CHROMIXS (60). Buffer frames were averaged, and the averaged buffer intensity was subtracted from the individual protein frames. The subtracted protein frames were then scaled and averaged. The data were processed and the Bayesian inference molecular weight estimates were obtained using PRIMUS (61,62). 20 ab initio models were generated for both data sets using GASBOR (63). The models were averaged using DAMAVER, and the most typical model was selected (64). Experimental scattering curves were compared with curves calculated from the P4H-TM structure monomer and dimer with CRYSOL (65). The processing software was part of the ATSAS package, version 3 (66).

Data availability
Transmembrane prolyl 4-hydroxylase crystal structure has been submitted to Protein Data Bank with the identification number 6tp5. All other data are contained within the article and the supporting information.