Molecular architecture and domain arrangement of the placental malaria protein VAR2CSA suggests a model for receptor binding

VAR2CSA is the placental-malaria specific member of the antigenically variant Plasmodium falciparum erythrocyte membrane protein 1 (PfEMP1) family. It is expressed on the surface of Plasmodium falciparum infected host red blood cells and binds to specific chondroitin-4-sulfate (CSA) chains of the placental proteoglycan receptor. The functional ~310 kDa ectodomain of VAR2CSA is a multi-domain protein that requires a minimum 12-mer CSA molecule for specific, high affinity receptor binding. However, how these domains interact to create the receptor binding surface is not known, limiting efforts to exploit its potential as an effective vaccine or drug target. Using small angle X-ray scattering and single particle reconstruction from negative stained electron micrographs of the ectodomain and multidomain constructs, we have determined the structural architecture of VAR2CSA. The relative location of the domains creates two distinct pores that can each accommodate the 12-mer of CSA, suggesting a model for receptor binding. This model has important implications for understanding cytoadherence of IRBCs and potentially provides a starting point for developing novel strategies to prevent and/or treat placental malaria.

As a result of frequent Pf infections, most adults in malaria endemic areas have acquired protective anti-PfEMP1 antibodies to parasites strains expressing various PfEMP1s, except the placentalspecific one, VAR2CSA (22). During the first pregnancy, Pf exploits the development of the placenta to overcome their pre-existing immunity by expressing VAR2CSA and sequestering in the placenta by binding to specific CSA chains of the chondroitin sulfate proteoglycan (CSPG) receptor. Subsequently, rapid parasite multiplication contributes to placental dysfunction, maternal anemia, preterm delivery, low neonate weight and maternal and pediatric morbidity and mortality (23,24); collectively these conditions are referred to as placental malaria (PM) (25)(26)(27)(28). Women infected with Pf during prior pregnancies produce anti-VAR2CSA antibodies (29) and thus resist PM development in subsequent pregnancies (30), suggesting that VAR2CSA is a suitable therapeutic target. However, the parasite expresses polymorphic VAR2CSA, which poses challenges in the development of long-lasting efficacious vaccines against diverse CSA-binding isolates. Even so, placental sequestration of IRBCs in the intervillous spaces of the placenta and syncytiotrophoblast surface is an obligate step in the pathology of PM (31)(32)(33)(34)(35) that is facilitated by VAR2CSA binding to the host CSA chains of the CSPGs. Therefore, understanding the nature of the CSA:VAR2CSA interaction is a critical step toward developing effective therapeutic strategies that prevent IRBC cytoadherence in the placenta (36).
VAR2CSA is a ~350 kDa membrane protein consisting of a large ~310 kDa non-glycosylated extracellular ectodomain, a single pass transmembrane helix, and a ~42 kDa cytoplasmic acidic terminal sequence that interacts with a number of host cell and parasite-derived proteins (34). The functional CSAbinding ectodomain is comprised of six Duffy-binding-like (DBL) domains (DBL1x, DBL2x, DBL3x, DBL4ε, DBL5ε and DBL6ε) and two Interdomains (ID1 and ID2/CIDRPAM), connected by short linker sequences (Fig. 1A). Together, these domains form a high affinity ligand binding site with specificity for a minimum 12-mer CSA that has a characteristic, low C4 sulfate content (lsCSA) (31,33,37). In order to gain structural information, small angle X-ray scattering (SAXS) studies were performed using HEKexpressed, engineered non-glycosylated and baculovirus expressed glycosylated ectodomains (38,39).
However, the resultant molecular envelopes were relatively featureless, preventing identification of individual domains or insight into carbohydrate binding. To date, there is no high-resolution structure of the VAR2CSA ectodomain, although crystal structures of E.coli-expressed DBL3x (40,41), DBL6ε (42,43) and the tandem DBL3x-DBL4ε domains (44) reveal a conserved fold comprised of three subdomains stabilized by multiple intra-domain disulfide bonds. Despite this, the detailed tertiary structure that forms the functional, high affinity CSA-binding surface remains unclear. It is generally agreed that domains DBL1x, ID1, DBL2x and half of ID2 (ID2a) are involved (38,39,(45)(46)(47); but it is unclear whether they comprise the entire surface area for CSA binding. Therefore, to understand the details of carbohydrate binding it is essential to understand the structural architecture of VAR2CSA.
In the present study, we have expressed and characterized, structurally and functionally, the VAR2CSA ectodomain and a set of N-and C-terminal deletion constructs (Fig. 1). These proteins are folded and thermally stable and NTS-DBL6ε and DBL1x-ID2a can specifically bind the lsCSA chains of CSPG with nM affinity. Further, using a combination of size exclusion chromatography in-line with SAXS (SEC-SAXS), single particle reconstruction of negative stained electron micrographs and basic homology modeling, we have determined the relative locations of the DBL domains and produced a validated model of the VAR2CSA ectodomain. Importantly, these studies reveal, for the first time, a defined molecular shape with distinctive pores that transverse the molecule and suggest a credible model for CSA binding.

Production, purification and characterization of VAR2CSA constructs
The codon-optimized synthetic gene encoding the VAR2CSA ectodomain (NTS-DBL6ε) of P.
falciparum 3D7 strain and four deletion constructs corresponding to DBL1x-ID2a, ID2b-DBL6ε, DBL3x-DBL6ε and DBL4ε-DBL6ε (as defined in Fig. 1A and Table S1) were expressed in HEK 293-F cells. The proteins are produced as secreted monomers and purified using a combination of ultrafiltration, nickelaffinity, cation-exchange, and size-exclusion chromatography. After purification, a single band, at the expected apparent molecular mass, is observed upon SDS-PAGE under reducing conditions (Fig. 1B).
Similarly, under non-reducing conditions a single band is observed (Fig. 1C). The purity of each protein construct, as assessed by the SDS-PAGE band profiles, is ~98%. Yields of the purified recombinant proteins range from 0.5-0.7 mg/l for NTS-DBL6ε to 14 mg/l for DBL4ε-DBL6ε. Several additional deletion constructs (DBL1x-DBL5ε, DBL1-ID2b, ID1-DBL6ε, ID1-ID2) were tested for protein production in our system, but these either failed to express or to purify as folded monomers and thus were not analyzed further.
To assess the folding of the purified recombinant proteins, far UV CD spectra were recorded at 25 °C and evaluated. A strong maximum at 200 nm and troughs at 208 nm and 222 nm are observed ( Fig.   S1A) that are characteristic of proteins containing substantial α-helical secondary structures. The spectra are similar to results reported for VAR2CSA constructs DBL6ε, DBL4ε-DBL6ε and DBL1x-DBL6ε (38).
The stability of each construct was investigated by following the changes in ellipticity at 222 nm as a function of temperature between 25-85 °C. In all cases, a transition occurs between 70 and 75 °C (Fig.   S1B), demonstrating that these constructs are thermally stable. The observed transition temperatures are similar to those reported earlier for analogous constructs (38). However, for each construct, the spectrum collected at 90 °C still shows residual ellipticity at 208 nm and 222 nm (Fig. S1C-G), demonstrating that unfolding is incomplete, preventing a thermodynamically-justified value for Tm from being calculated.
Indeed, the complete unfolding of NTS-DBL6ε is achieved only in the presence of the reducing agent tris(2-carboxyethyl) phosphine (Fig. S1H). These results are consistent with the constructs being stable, as expected for domains containing a large number of disulfide bonds (48).
The binding of each VAR2CSA construct to CSPG fraction II (CSPG-II) purified from bovine cornea (49), which contains chondroitin sulfate chains having low levels of 4-sulfate groups, was assessed in an ELISA-based assay (Fig. S2A). NTS-DBL6ε binds with a high affinity (KD 19 ± 1 nM), whereas DBL1x-ID2a binds with a much lower affinity (KD 113 ± 24 nM). No other deletion construct (ID2b-DBL6ε, DBL3x-DBL6ε and DBL4ε-DBL6ε) show measurable binding in this assay. Competition studies with NTS-DBL6ε and DBL1x-ID2a using bovine tracheal CSA and CSC, which contain different relative proportions of CSA, show that CSA efficiently inhibits the binding of both constructs to CSPG-II. CSC, which contains 10-20 % CSA and 80-90 % C6S, shows relatively lower inhibition. HA, a polymer of disaccharides composed of D-glucuronic acid and N-acetyl-D-glucosamine and no 4-sulfated groups, also has no measurable inhibition (Fig. S2B). Together, these results demonstrate that the purified NTS-DBL6ε and DBL1x-ID2a bind low sulfated CSA and that the former binds with higher affinity than the latter.

SAXS analysis suggests that the VAR2CSA ectodomain is compact.
In order to gain insight into the structural organization of the ectodomain, NTS-DBL6ε and the deletion constructs, DBL1x-ID2a, DBL3x-DBL6ε and DBL4ε-DBL6ε, were characterized using SEC-SAXS (Table 1, Figs. 2, S3). The data were processed using the ATSAS Suite (50,51) and the values obtained from this analysis compared to those previously reported, if available (Table 1 and Figs. 2A-D,   S3). All of the constructs are linear in the Guinier region of the SAXS curve over the given qRg ranges ( Fig. 2E), indicating that they are monodisperse. The Rg value obtained for DBL1x-ID2a is smaller than that reported previously (39), which is consistent with the removal of small amounts of dimer and higher order aggregates during chromatography and the absence of glycosylation in our constructs. It should be noted that, although the exact domain boundaries are slightly different (Fig 1A), this modification alone would not account for the observed differences in Rg values. For each construct, assessment of the molecular weight (Table 1) using a consensus Bayesian method, implemented in ATSAS 3.0 (52), is in excellent agreement with both the theoretical value and those obtained by SDS-PAGE, providing further support that each of the proteins exist as a monomer in solution. For NTS-DBL6ε, DBL1x-ID2a, and DBL3x-6ε, the pair-distribution, P(r), function approaches zero smoothly at r = 0 Å and the Dmax and has a single peak consistent with relatively globular molecules (Fig. S3A). By contrast, the P(r) function for DBL4ε-DBL6ε contains an asymmetric peak with a maxima at ~42 Å and a shoulder centered at ~62 Å, indicative of a molecule that contains spatially distinct modules, with the first peak due to the distribution of scattering centers (chords) within a module and the shoulder reflecting the chords between the modules. The dimensionless Kratky plots further support this analysis (Fig. S3B). NTS-DBL6ε and DBL1x-ID2a have similar plots that contain a single peak at qRg = ~1.7 and (qRg) 2 I(q)/I(0) = 1.1, in agreement with theoretical values of 1.75 and 1.1 for a globular protein (53), respectively, and returning to zero at qRg ~5 (Fig. S3B). The location of the peak is consistent with each construct behaving in solution as a single module. By contrast, the plots for the DBL3x-DBL6ε and DBL4ε-DBL6ε contain peaks that are shifted to longer qRg (1.8-1.95), indicative of a more elongated molecule. In addition, the plots for DBL3x-DBL6ε and DBL4ε-DBL6ε contain a shoulder at qRg ~4.9 and qRg ~4 respectively, and return to zero at qRg ~10, characteristic of proteins containing spatially distinct modules (54).

Ab initio bead model of VAR2CSA constructs reveal their overall shape.
For each construct, an ab initio averaged/filtered bead model was calculated from the models of 20 individual simulations with DAMMIF in ATSAS 3.0. The simulated SAXS curves for these models are in good agreement with the data, as judged by the χ2 values (Table 1). For NTS-DBL6ε, the ensemble of individual calculated models are similar (Table 1) The bead models of the fragments were then fit into NTS-DBL6ε using chimera, placing DBL3x-DBL6ε first because it was the largest fragment. Although individually, each could occupy a range of positions, there was only one arrangement in which DBL3x-DBL6ε and DBL1x-ID2a fit into the fulllength envelope simultaneously without a large conformational rearrangement (Fig. 2F). DBL4ε-DBL6ε fits inside DBL3x-DBL6ε leaving an additional volume that, presumably, corresponds to DBL3x; however, it could not be placed in a unique orientation. Thus the SAXS data are consistent with DBL1x-2IDa occupying the base of the molecule and DBL3x-DBL6ε occupying the length of the molecule including a thinner head region.

VAR2CSA.
In order to generate a higher resolution model than possible by SEC-SAXS analysis, single particle reconstruction of negative stained images was calculated for all of the constructs. During data collection, examination of regions throughout the grids revealed that all constructs produced both positively and negatively stained images on the same grid and, in some cases, within the same field. Therefore, data were collected from areas of the grids in which negatively stained particles dominated. and classes corresponding to side-on images of complexes between NTS-DBL6ε and the Y-shaped mAb ( Fig. 3G). In this final set of 2D-classes, the Y-shaped mAb is clearly visible at the end of the head, providing an unambiguous identification of the location of the C-terminal cMyc tag adjacent to the head domain (small lobe). It follows that DBL6ε is located in the density adjacent to the bound antibody at the free end of the head, furthest away from the neck and body. The relatively short linker between DBL5ε and DBL6ε (~10-20 amino acids) and the clear connectivity of the density in the head suggests the remaining head density arises from DBL5ε.
Particles selected from the best 2D classes were used to reconstruct the 3D volume of NTS-DBL6ε ( Fig. 4A; Table 2). This reconstruction shows that the ectodomain adopts a slightly extended conformation with overall dimensions 175 Å x 160 Å x 110 Å (Fig. 4A) that is slightly larger than those estimated from the corresponding SAXS bead model. Consistent with the views in the 2D classes, the 3D volume is comprised of three layers that are together broadly reminiscent of a duck, with a head, body, feet and a tail ( To further assign volumes in L2, single particle reconstructions of DBL3x-DBL6ε (Fig. S8) and DBL4ε-DBL6ε (Fig. S9) were calculated and aligned with that of ID2b-DBL6ε (Fig. 6A). The DBL3x-  (Table 2 and Fig. 6A, Fig. S9). The relatively weaker stain excluding densities connecting L1 and L2 observed in ID2b-DBL6ε, are less striking in the DBL3x-DBL6ε and DBL4ε-DBL6ε reconstructions, however, corresponding maps and projections are consistent with the observed 2D classes (Fig. S8). The weaker connecting density is likely due to a combination of a lower signal-to-noise ratio of images of these constructs and, potentially, relatively more conformational freedom ( Table 2). The volumes of DBL3x-DBL6ε and DBL4ε-DBL6ε were fit into that of ID2b-DBL6ε using an automated fitting algorithm in Chimera (Fig. 6A). Since the distinctive volume of DBL5ε and DBL6ε in L1 is common to all N-terminal deletion constructs, it provided a visual constrain for assessing fit. In each case, applying this constrain, the placement of volumes is unique and reveals that in L2, DBL4ε is located below both DBL5ε and DBL6ε in the body and DBL3x forms the tail (Fig. 6A). It should be noted that for all constructs containing DBL4ε, DBL5ε and DBL6ε, the domains occupy equivalent locations in all maps, however, in the DBL3x-DBL6ε map, the volume corresponding to DBL6ε is rotated by ~45 ° relative to its location in NTS-DBL6ε and ID2b-DBL6ε. The distinctly different orientation of the terminal domain is also evident when comparing some 2D classes (Fig. 6B, bottom panel). Further, although the two arms of the S-shape are better defined in the volume of ID2b-DBL6ε (Fig. 6A), there was no obvious connected density that could be assigned to ID2b. It is not clear whether the presence of ID2b directly improves the reconstruction quality via a stabilization of the structure or it is simply due to the higher signal-to-noise of the negative stained images for this construct.
The structural analysis of the EM and SAXS data were performed independently and thus a comparison of the models obtained from each technique provides reliable validation for the proposed structure. Adjusted for resolution, the volumes for each of the constructs generated by SAXS and negative stained EM are in good agreement. To further compare the models, EM density maps were converted to dummy atom models using EM2DAM (ATSAS 3.0) and theoretical SAXS curves were calculated from them and compared to the experimental SAXS curves (Fig. S10). There was good agreement between these curves and, as expected, the EM-based curve contained additional features consistent with finer detail and less movement or degeneracy.

Docking of experimentally and computationally derived domain modules into the EM reconstructions.
For rigid body fitting of data, models corresponding to DBL1x, DBL2x, DBL4ε and DBL5ε were generating using Chimera and Modeller (55,56). Structures of DBL3x and DBL6ε were used directly after modeling regions that were disordered in the crystal structures using the Chimera Modeller loops/refinement plug-in (56). However, ID1 (~200 amino acids) and ID2 (~300 amino acids) could not be modeled as suitable structural templates are not available (Table S3). The domains we placed as shown in Fig. 9A. DBL6ε fit into the expected density at the tip of the molecule and DBL5ε fit into the remaining density in the head. Based on the connection determined in the negative strain reconstruction, DBL3x fit into the tail and DBL4ε was constrained to be adjacent to DBL3x and DBL5ε. DBL1x and DBL2x fit into the base, although the arrangement of these two domains is speculative because the folds of ID1 and ID2 are not known. DBL3x and DBL4ε can be accommodated in an arrangement similar to the structure of the DBL3x-DBL4ε tandem domains (44). Fitted in this way, the two large pores, P1 and P2, observed in NTS-DBL6ε, are created by DBL1x, ID1, DBL2x and DBL4ε, and DBL2x, DBL3x and DBL4ε, respectively.

DISCUSSION
The VAR2CSA ectodomain and N-and C-terminal deletion constructs described in this work are produced in a mammalian expression system that exports them into the media as folded proteins. The purified proteins are very stable, as expected due to the presence of a large number of intra-domain disulfide bonds, and remain soluble and monomeric at concentrations >5 mg/ml, demonstrating that they are suitable for structural characterization. Analysis by SAXS and negative stain EM show that the protein constructs are homogeneous in terms of overall shape and apparent sizes. Taken together, this argues that the structures discussed here largely represent authentically folded VAR2CSA and folded domains thereof. Additionally, The ab initio envelope of NTS-DBL6ε calculated from SEC-SAXS, described here, contains the same features as models published earlier (38) after making allowances for the presence of the estimated 5% glycosylation (39) The ab initio envelope of NTS-DBL6ε calculated from SEC-SAXS, described here, contains the same features as published models after making allowances for the presence of the estimated 5% glycosylation (38,39). This suggests that slight differences in constructs and data collection/processing do not affect the overall structure of the molecule.  12)) is observed in all the crystal structures of DBL3x, DBL4ε and DBL6ε. However, the remaining free, surface exposed, cysteine residues potentially are available to form inter-domain disulfide bonds that could stabilize the overall architecture of VAR2CSA. Although the current studies, due to their limited resolution, do not provide direct evidence for the existence of inter-domain disulfide bonds, at least two discrete volumes of stainexcluding density between the spatially distinct regions of the head and DBL4ε are observed in 2D classes of all DBL4ε-containing reconstructions. This is consistent with the presence of inter disulfide bonds and could account, in part, for the observed rigidity of this region of the structure. It should be noted that only one of these densities, which we term the neck, connecting one end of the head to DBL4ε is routinely observed in the 3D maps at the typical contour levels used. This most likely reflects a combination of conformational heterogeneity between the head and the body and limitations in image quality and number for the minor tilted views compared to the dominant side-on views.
A striking characteristic of the NTS-DBL6ε structure is the presence of two pores, P1 and P2 that are approximately parallel and traverse the width of the protein, accounting for the layers seen in some 2D classes. Both pores have dimensions that could accommodate CSA 10-12mers or other extended carbohydrate polymers (Fig. 9 B, C). In this speculative model, binding of carbohydrate in P1 would involve residues from DBL1x, ID1, DBL2x, DBL4ε and likely ID2a, and in P2 involve residues from DBL2x, DBL3x, DBL4ε and possibly ID2b. Enveloping the carbohydrate in a pore provides a larger surface area for binding interactions while minimizing the accessibility of this functionally-essential surface to immune surveillance. CSA and its non-sulfated homolog chondroitin can be described as stiff worm like polymers (persistence lengths ~70-80 Å) (60). Based on the structure of CSA (1CSA.PDB), a fully extended 12-mer CSA chain has an end-to-end distance of ~ 100-115 Å with an estimated minimum RMS end-to-end distance of ~90-100 Å, although it is likely to be closer to the fully extended length in solution. Therefore this single 12mer chain cannot occupy both pores simultaneously without either large scale rearrangements in the highly disulfide bond stabilized protein or substantial protein-induced hairpin bending of the CSA; either scenario seems unlikely. Many studies have concluded that residues within DBL1x-2IDa are necessary and sufficient for high affinity CSA binding, and the arrangement of domains forming P1 is consistent with this data. Binding studies presented here also show that DBL1x-2IDa has a lower affinity for CSA than NTS-DBL6ε, consistent with involvement of interactions outside of this region, such as DBL4ε in our model. The role of P2 in carbohydrate binding, if any, is less clear. One possibility is that it could be involved in protein-protein interactions either in the knob or during transport from the parasite to the surface of the red blood cell. A second possibility is that it is a carbohydratebinding site, although to date, binding studies have not revealed the presence of two high affinity CSA binding sites. Thus, if each binds a distinct carbohydrate chain, then the second site would have to be of much lower affinity than the first and binding would have to be independent. Biologically, this might have some advantage for parasite attachment to the placental extracellular matrix. The CSPG molecules are part of a complex mixture of polymers that form the host extracellular matrix of the intervillous spaces. In the context of this complex host receptor environment, two binding sites on VAR2CSA may facilitate its attachment to a heterogeneous carbohydrate-dense environment. However, at the current resolution it is impossible to clearly determine which of these possibilities, if any, are correct.
In conclusion, these studies have revealed for the first time, the structure of VAR2CSA, at moderate resolution, its orientation relative to the IRBC membrane and the relative arrangement of domains. The structure contains two clear pores that transverse the molecule and suggest a possible model for host receptor binding. This model provides plausible explanations for the high affinity CSA interaction and the ability to evade initial immune surveillance.

Synthesis and sub-cloning of VAR2CSA gene into pSectTag2-based expression vectors.
The sequence of the codon optimized VAR2CSA gene from strain 3D7 (JQ247428.1) was used as a starting point to generate a synthetic gene (amino acids 59-2641). Serine or threonine residues at 20 sites, identified for sites of potential O-linked glycosylation arising from the mammalian machinery (38), were replaced with codons for alanine or leucine (GeneArt, ThermoFisher Scientific). KpnI and ApaI sites were added to the 5' and 3' end, respectively for subcloning into the pSecTag2/Hygro2 vector which contains a N-terminal IgK secretion signal and non-cleavable C-terminal cMyc and hexahistidine tags for protein purification. In some constructs, as detailed in the Fig. 1A legend, the cMyc tag in the vector was replaced by a TEV-cleavable 3X-FLAG-tag using a GBLOCK primer listed in Table S1. In order to generate NTS-DBL6ε, a GBLOCK (IDT) (Table S1) corresponding to amino acids 1-58 was designed and inserted into pSecTag2/Hygro A vector using In Fusion (Takara, Japan) according to the manufacturer's protocol to create a vector that contained amino acids 1-58 followed by KpnI and ApaI sites. Residues 59-2641 were then sub-cloned as before, which generated insertion of a glycine and threonine residue between native residues 58 and 59. DBL1x-ID2a (aa 59-1019), ID2b-DBL6ε (aa 1016-2641), DBL3x-DBL6ε (aa 1218-2630), DBL4ε-DBL6ε (aa 1560-2630) were created by PCR using the plasmid DNA of DBL1x-DBL6ε (1 ng/µl) as a template, primers (Table S1) and the Q5 site directed mutagenesis kit (NEB) for polymerase and parental DNA digestion. In all cases, after digestion 0.8 µl of the PCR reaction was introduced by transformation into 20 µl DH10B cells and grown at 37 °C on LB carbenecillin plates overnight. Single colonies were selected and inoculated into 8 ml LB media supplemented with carbenicillin (100 mg/ml). At mid log phase of growth, 1.5 ml was removed for freezer stocks, added to 150 µl autoclaved 80 % glycerol in 2 ml cryovials and stored at -80 °C. The remaining culture was grown for an additional ~2 h and harvested by centrifugation in a swinging bucket rotor for isolation of DNA using QIAprep Spin Miniprep kit (Qiagen), following the standard protocol.
All DNA was sequenced through the coding region to confirm that the sequence was correct (Eurofins Genomics, Lancaster, PA). The translated protein sequences of the constructs are detailed in Table S2.
Plasmid DNA was obtained by scraped from the -80 °C freezer stock into LB media supplemented with 100 mg/ml carbenicillin and isolated using the appropriate QIAprep kit.

Protein expression and purification
The expression and purification of proteins is based on Srivatsava et al (44)  pre-equilibrated with SEC Buffer A (10 mM NaKHPO4, pH 6.8, 150 mM NaCl). The protein was eluted with the same buffer and protein-containing fractions were analyzed by SDS-PAGE and Coomassie and/or silver staining. Fractions containing purified proteins (>95% pure) were concentrated to 2-8 mg/ml. The proteins were either used for experiments immediately or snap frozen in liquid nitrogen and stored at -80 °C for future use.

CD measurements
CD spectra (190-260 nm) of proteins in 0.2 mg/ml of 10 mM NaKHPO4, pH 6.8, 100 mM NaF were recorded on a Jasco J-1500 spectrophotometer at 25 °C and 90 °C using a 0.1 cm path length cuvette. Spectra were accumulated at 0.5 nm data intervals at a scan rate of 50 nm/min. For each sample, three scans were collected, averaged and corrected for base line rotation. Data were converted to molar ellipticity. For thermal denaturation, spectra of proteins in 0.2 mg/ml of 10 mM NaKHPO4, pH 6.8, 100 mM NaCl were recorded on a Jasco J-720 spectrophotometer spectra at 222 nm over the temperature range 25 °C to 85 °C at 1 °C intervals. Data was processed using GraphPad Prism version 5.

Analysis of VAR2CSA proteins binding to CSPG
The extent of binding of the VAR2CSA proteins was assessed in a sandwich ELISA-based assay using bovine corneal CSPG-II (49). The preparation of bovine corneal CSPG-II used here contains lowsulfated CSA chains with 28% 4-sulfated, 3 % 6-sulfated and 69 % non-sulfated disaccharide moieties.
Briefly, specific wells of a 96-well microtiter plate were coated with bovine corneal CSPG-II by incubation with 50 µl (200 ng/ml CSPG-II in PBS, pH 7.2) at 4 °C overnight. All subsequent steps of this assay were performed at room temperature. Unbound CSPG-II was removed by washing the wells with 100 µl PBS, pH 7.2 containing 0.05% Tween-20 (PBST) and then all wells were blocked with 100 µl of 1% BSA in PBS, pH 7.2 for 2 h to minimize non-specific binding. Binding of the individual VAR2CSA proteins was assessed by serial 2-fold dilutions (50 µl) of the specified VAR2CSA construct (see Fig. S2 legend) into CSPG-II coated and uncoated (blank, to assess non-specific protein binding) wells of the microtiter plate and incubated for 2 h.

Small angle X-ray scattering measurements
Data were collected by in-line SEC-SAXS either at LiX 16-ID beamline of the National Synchrotron Light Source II (NSLSII), Brookhaven National Laboratory, Upton, NY 11973 or 18ID of the Advanced Photon Source (APS), Argonne National Laboratory, Argonne, IL (Table 1, S3). For data collection, 250 µl of each sample was applied to a Superose 6 10/300 GL size exclusion column (GE healthcare) at concentrations between 4-10 mg/ml and eluted at room temperature with a flow rate of 0.5 ml/min in 10 mM NaKHPO4, pH 6.8, 150 mM NaCl. Scattering curves were collected continuously throughout the elution to obtain buffer and protein scattering curves. Data were initially processed using beamline software py4XS (NSLSII) and RAW (APS) and subsequently programs in the ATSAS Suite (versions 2.8 and 3.0) (53). Ab initio bead models were calculated from 20 candidate models calculated in DAMMIF, averaged using DAMAVER and refined with DAMMIN. For display purposes and superposition, the bead models were converted to volumes within Chimera and superimposed using the

Negative staining and EM analysis
Formvar-coated carbon grids (300 µm mesh; EMS, PA) were prepared by plasma cleaning (air) for 60 s using an Emitech K590X plasma cleaner (Diener Electronics, Germany) immediately prior to use. For NTS-DBL6ε, or DBL1x-2IDa, a 4 µl aliquot of the protein alone, or bound to cMyc mAb (25 µg/ml in 10 mM NaKHPO4, pH 6.8, 150 mM NaCl) where relevant, was applied to the grids and incubated at room temperature for 10 s. Excess protein solution was removed from the edge of the grid using thin strips of Whatman No. 1 paper. The grids were washed 3 times by touching them sequentially to the surface of three distilled water drops (35 µl) and excess liquid removed as before. For staining, the grids were floated sequentially on the top of three 35 µl drops of 1% uranyl acetate in water (35 µl) for 2x 10 s, and 1 min, respectively, air dried at room temperature for 1 h and stored under anhydrous conditions until needed. EM data were collected using a JEOL 2100 transmission electron microscope equipped with a 4K CCD camera. All images were collected at 200 eV at a pixel size of 2.3 Å/pixel using Digital Micrograph.

Single particle reconstruction from EM micrographs of negative stain images
Single particle reconstructions were performed in Relion 3.1 (62) and validated using tools and protocols in Scipion 2 (63). Particles from 5-10 images were picked automatically using a range of minimum and maximum particle diameters and thresholds with the Lorenztian of Gaussian algorithm to estimate optimal values. Particles were then picked from the entire set of images using the same algorithm and the optimized numbers for each parameter and extracted, using a box size of 1.5 times the longest dimension of the particle, with a pixel size of 6.9 Å to improve signal to noise. The particle stack was subjected to 2-3 rounds of 2D classification to obtain a starting particle set for 3D reconstruction. Initial starting models were computed to 25 Å resolution using a randomly chosen subset of 4000 particles.
Subsequently, all of the particles were used in 3D classification with the calculated starting model as a reference. The best class or classes, were subjected to 3D refinement and post processed to obtain a final masked map. The reported resolution was obtained using the gold standard FSC (gsFSC) during post processing. In some cases, a second round of 2D/3D classification and refinement improved the shape of the gsFSC. In these cases, the particles from the first 3D refinement were subjected to an additional round of 2D classification followed by 3D classification with a single class. The resultant map was further refined and post-processed, as before, to yield the final map described in this work. The maps for all constructs were validated using the 3D reprojection and overfitting tools in Scipion 2 with data from the final refinement in Relion 3.1 to confirm that there was no overfitting within the resolution ranges used. A summary of the results of reconstruction and 3D reprojection are given in Table 2 and Fig. S4-9.

Basic homology modeling of individual domains.
Templates for homology modeling were identified using the BLAST server (https://blast.ncbi.nlm.nih.gov/Blast.cgi) and visualized using Chimera. Basic homology modeling was performed using the MODELLER interface within Chimera. For each domain, models were evaluated by visual inspection, and GA341 and zDOPE scores (Table S4). Residues 1-9, 402-568, 906-1204, 1945-of the members of the Yeager lab for their patience, kindness and friendship. We thank beamline staff at NSLS-II including Drs Vivian Stojanoff, Sirish Chodankar and James Byrnes for beamline support and Dr Lin Yang for thoughtful discussions and suggestions. We also thank Dr Suresh Nayaranan for help collecting SAXS data at the Beyond Rg BioSAXS short course at APS. Dr Matt Swullius is thanked for help with the EM studies.

Statement of contributions:
LG performed all protein purification and enzyme assays, performed and analyzed all CD experiments, wrote and edited the materials and methods for these sections and collected SAXS and EM data. MCB collected SAXS data, processed and evaluated all SAXS data and performed basic modeling, JMF processed and evaluated all EM data. MCB, CDG and JMF conceived the experiments, analyzed the results, formulated the discussion and wrote the manuscript. All authors have read the manuscript and agree with the conclusions presented.

Conflict of Interest:
The authors declare that they have no conflicts of interest with the contents of this article.

FOOTNOTES
The studies performed were supported by grant R01 AI104844 (D.C.G.) from the National Institute of  For each construct, the numbers at either end denote the beginning and ending amino acid sequence numbers. I, II and IV contain a non-cleavable C-terminal cMyc tag and hexahistidine tag, as coded in the commercial vector. For the III and V constructs, the C-terminal tag is replaced with a TEVcleavable 3X-FLAG tag and a hexahistidine tag, as defined in Table S2, GBLOCK2. In addition, P1 contains a glycine and threonine residue inserted between native residues 58 and 59 as a cloning artefact.  reveal an asymmetric molecule with a larger broad base and smaller, narrower top. ab initio envelopes for DBL1x-ID2a (yellow) and DBL3x-DBL6ε (green) show that the envelopes of the two deletion constructs account for the NTS-DBL6ε envelope within the limits of the resolution. For each construct, the averaged/filtered ab initio bead models were converted to a pseudo surfaces using Chimera (55).