Structural characterization of the extracellular domain of CASPR2 and insights into its association with the novel ligand Contactin1

Contactin-associated protein-like 2 ( CNTNAP2) encodes for CASPR2, a multidomain single transmembrane protein belonging to the neurexin superfamily that has been implicated in a broad range of human phenotypes including autism and language impairment. Using a combination of biophysical techniques, including small angle X-ray scattering, single particle EM, analytical ultracentrifugation and bio-layer interferometry, we present novel structural and functional data that relate the architecture of the extracellular domain of CASPR2 to a previously unknown ligand, Contactin1 (CNTN1). Structurally, CASPR2 is highly glycosylated and has an overall compact architecture. Functionally, we show that CASPR2 associates with µM affinity with CNTN1 but, under the same conditions, it does not interact with any of the other members of the contactin family. Moreover, by using dissociated hippocampal neurons we show that microbeads loaded with CASPR2, but not with a deletion mutant, co-localize with transfected CNTN1, suggesting that CNTN1 is one endogenous ligand for CASPR2. These data provide novel insights into the structure and function of CASPR2, suggesting a complex role of CASPR2 in the nervous system. In myelinated to

juxtaparanodal region of the axon where it appears to associate with the immunoglobulin domains of TAG-1 (transient axonal glycoprotein-1) to form a scaffold which clusters the potassium channels Kv1.1 and Kv1.2 (2)(3)(4).
CASPR2 is predicted to be a type I transmembrane protein of 1331 amino acids with the extracellular domain followed by a single transmembrane domain and a short (48 residues) intracellular domain that terminates with a class II PDZ recognition motif. Computational predictions suggest that CASPR2 has 12 putative N-liked glycosylation sites and 36 Cys residues likely making 18 disulfide bonds, forming 8 independently folded domains: four Laminin, Neurexin, Sex-hormone-binding globulin domains (LNS), two epidermal growth factor (EGF) domains, one discoidin domain, and one fibrinogen-like domain (Fig.  1A). CASPR2 shares an overall domain organization with α-neurexin-1 despite a relatively low amino acid identity (~23% identity, ~39% similarity). However, distinctive features such as a discoidin domain in place of the first LNS domain and a fibrinogen-like domain in place of the 4 th LNS domain suggest a different overall structural architecture. No information about the threedimensional structure of CASPR2, other than that inferred from sequence homology, is currently available. Functionally, only TAG-1 (contactin 2 or CNTN2) has been thus far identified as the extracellular ligand for CASPR2 (2)(3)(4).
Individuals in a cohort of Amish children, homozygous for a frame-shift mutation (single-base G deletion at nucleotide 3709 in exon 22) involving the CNTNAP2 gene, present focal seizures and autistic regression after the onset of the seizures (5,6). In these patients, surgical biopsy revealed severe cortical dysplasia in all children and periventricular leukomalacia in one girl, suggesting that CNS myelination was affected. The frame-shift mutation introduces a premature stop codon at position 1253 (CASPR2-1253*), resulting in a protein that is devoid of its transmembrane and intracellular domains. In vitro, we show that, CASPR2-1253* folds properly, but it is secreted in the extracellular space, thus becoming nonfunctional due to the fact that the protein is no longer tethered to the cell membrane (7). In humans, postmortem studies revealed that the juxtaparanodes were disrupted in multiple sclerosis lesions. In particular, the localization and expression levels of CASPR2 and TAG-1 were reduced around the lesions and absent in lesion areas (8) although it remains unclear whether nodal disruption represents a cause or a consequence of the disease.
Another cell adhesion molecule member of the CNTN family, Contactin-1 (CNTN1), has been shown to be essential for the organization of paranodal regions in myelinated axons. CNTN1 is composed by six Ig domains followed by four FNIII domains and a glycosylphosphatidylinositol (GPI) anchor. At the amino acid level, the extracellular domain of human CNTN1 is ~48.6% identical (~74% similar) to the extracellular domain of human TAG-1. CNTN1 is required for the cell surface localization of CASPR1 where they form a complex during the myelination process (9)(10)(11). CNTN1 has also been implicated in hippocampal neurogenesis and cell proliferation as well as in synaptic plasticity and memory in adult mouse (12). Expression of CNTN1 in both neurons and oligodendrocytes suggests a role in myelination, which has been confirmed by low levels of myelin basic protein (MBP) in the CNTN1 KO mouse and by its interaction with other molecules involved in myelination (13)(14)(15)(16).
Here, we present data on the overall architecture of the extracellular domain of CASPR2 and report the identification of a new endogenous ligand, CNTN1. Intriguingly, although CASPR2 and CNTN1 expression overlaps at the paranodes during early development (17) suggesting a functional relationship, the association between the two proteins has never been reported. We used small angle X-ray scattering, analytical ultracentrifugation, and single particle negative stain electron microscopy to build a structural model of the extracellular domain of CASPR2. We also show that CNTN1 directly binds to CASPR2 with µM affinity through specific domains. Remarkably, under the same experimental conditions, we could not detect CASPR2 association with TAG-1, its putative ligand (2)(3)(4). Moreover, by using dissociated hippocampal neurons we show that microbeads loaded with CASPR2 co-localize with the transfected CNTN1, whereas beads loaded with a deletion mutant do not, suggesting that CNTN1 is one endogenous ligand for CASPR2. Taken together these data provide novel insights into the structure of CASPR2 and the function of both CASPR2 and the new extracellular ligand, CNTN1.

EXPERIMENTAL PROCEDURES
Cloning of the proteins-The full length CASPR2, TAG-1 and CNTN1 cDNAs were acquired from Open Biosystem (Thermo Fisher Scientific Inc, Pittsburgh, PA). The entire coding sequence of the proteins were fully sequenced and inserted into a pcDNA3.1 vector with either the FLAG or HA epitopes cloned after the leader peptide. The entire extracellular domains of CASPR2, TAG-1, CNTN1 and CNTN5 were cloned into a pCMV6-XL4 expression vector (18). This vector has a 3C protease cleavage site (LEVLFQ/GP) after the protein sequence and the beginning of the human F c sequence. All CASPR2 and CNTN1 deletion constructs were cloned into pCMV6-XL4 vector by insertion of NotI before the first domain of interest and by XbaI after the last domain of interest (Table 1). The XbaI site in the vector was placed at the beginning of the 3C protease sequence. The plasmids encoding for CNTN3, 4 and 6 were kindly provided by Dr. Woj Wojtowicz, University of California, Berkeley.
Cell culture and transfection -Human embryonic kidney 293 cells lacking Nacetylglucosaminyltransferase I (GnTI) activity (HEK293 GnTI-) were obtained from American Type Culture Collection (ATCC, Manassas, VA). With these cells glycosylation remains restricted to a seven-residue homogeneous oligosaccharide (19). Cells were grown in Dulbecco's Modified Eagle Medium (DMEM) supplemented with 2 mM glutamine, 10% (v/v) fetal bovine serum (FBS), maintained in a humidified incubator at 5% v/v CO 2 and 95% v/v air and periodically tested to ensure the absence of mycoplasma contamination. Stable cell lines were made for each mutant following the calcium phosphate protocol and kept in DMEM/10% FBS supplemented with 500 µg/ml G418 (Geneticin) (Sigma, St. Louis, MO).
Expression and purification of proteins -Constructs encoding the entire extracellular domain of CASPR2, CNTN1 or TAG-1 fused to F c were transfected into HEK293 GnTI-cells and selected by growth in the antibiotic G418 (20). For protein expression, cells were maintained at 37°C and 5% v/v CO 2 in Dulbecco's modified Eagle's medium containing up to 2% v/v FBS. Proteins were affinity purified using Protein-A Sepharose 4 fast flow resin (GE Healthcare, Waukesha, WI) in 150 mM NaCl 20 mM TRIS pH 8.0 and subsequently cleaved with 3C protease to remove the F c fragment. To ensure homogeneity of the preparation by absence of degradation or aggregation products, affinity purified proteins were re-purified and buffer exchanged by gel filtration using a Superdex 200-10/300 DE column (GE Healthcare, Waukesha, WI) in 150 mM NaCl, 10 mM Hepes, pH 7.4. Fractions corresponding to CASPR2 were collected and concentrated up to ~9.6 mg/mL. Aliquots of the purified proteins were separated by SDS-PAGE to check for purity and integrity. Lysozyme, for use as a scattering standard, was solubilized in 150 mM NaCl, 40 mM sodium acetate, pH 3.8, and dialyzed against the same buffer with the final dialysate used for solvent blank measurements. Protein concentrations were determined at 280 nm using extinction coefficients expressed as E 0.1% for CASPR2 (1.516 L/g) calculated using Protparam (21).
N-terminal sequencing and mass spectrometry determinations -Edman degradation was performed by the Molecular Structure Facility core at the University of  [22] for details on intensity measurements, which allow up to two samples to be measured per cell). All data were analyzed using UltraScan-III, rev. 2029 (23). Time-and radiallyinvariant noise was removed during analysis with the 2dimensional spectrum analysis (2DSA) (24), and diffusion-corrected sedimentation coefficient distributions were calculated with the enhanced van Holde-Weischet method (25). Parsimonious regularization of the 2DSA results and molecular weight determinations were performed with the genetic algorithm (GA) analysis (26,27). The GA solution is then analyzed by Monte Carlo analysis to determine confidence limits for the determined parameters (28). All calculations were performed on the Lonestar cluster at the Texas Advanced Computing Center at the University of Texas at Austin, and on the Jacinto cluster at the Bioinformatics Core Facility at UTHSCSA (29). Partial specific volume was calculated based on amino acid composition using UltraScan based on the method by Durchschlag (30) accounting for total carbohydrate, and were found to be 0.7217 cm/g.
Small angle X-ray scattering data analysis of CASPR2 -SAXS data recorded as I(q) vs q, where q = 4sin (2 is the scattering angle and  the X-ray wavelength, 1.54 Å, CuK) were collected from protein samples and matched solvent blanks (ultrafiltrate buffers for CASPR2 and last step dialysate for lysozyme) at 20°C using an Anton Paar SAXSess line collimation instrument at the University of Utah. Scattering intensities were detected on 2D position-sensitive image plates (10 mm slit and integration width) and data reduction was performed using SAXSQuant1D (Anton Paar proprietary software) as described in (31). The final 1D-scattering profiles for each protein in solution were determined by subtracting the matched solvent scattering contributions from the respective sample scattering and applying a geometry correction (10 mm slit and integration width) to take into account smearing effects caused by the slit collimation of the X-ray source. A concentration series for CASPR2 was measured between 1.0-9.6 mg/mL in 10mM Hepes buffer pH 7.4 and 150mM NaCl.
Guinier plots (ln(I(q)) vs q 2 ) were initially evaluated using the program SAXSQuant1D. The R g and I(0) for CASPR2 and the lysozyme standard were determined from the geometry-corrected SAXS data using linear regression and extrapolation of ln(I(q)) vs q 2 in the very low angle region of the plot (qR g ≤ 1.3). The probable inter-atomic distance distribution of vector lengths for each of the proteins, or pair distance distribution function, p(r), were calculated using the indirect Fourier transform method of Svergun as implemented in the program GNOM (32). This method also incorporates a geometric correction factor that takes into account the slit geometry of the instrument to produce a real-space vector length distribution. Structural parameters derived from the p(r), including I(0), R g and D max were determined for the CASPR2 proteins and lysozyme standards at various protein concentrations.
The determination of I(0) from the protein samples was used to evaluate sample monodispersity against a lysozyme standard. Dilute monodisperse proteins of the same X-ray scattering (electron) density in solution adheres to the relationship: where MW is the calculated (expected) molecular mass of the scattering protein (kDa), c is the protein concentration (mg/mL -1 ) and K is a constant. Lysozyme is known to be monomeric and monodisperse under the conditions measured (33). Thus the K value derived from the lysozyme I(0) data can be used as a standard to evaluate the MW of CASPR2 provided that protein concentrations are known within ~5 %. When the Kvalue of the lysozyme data is normalized to 1, it is found that the resulting K for the CASPR2 constructs spanning the concentrations series is 1.16, i.e., the MW of the CASPR2 proteins are 16% higher than the calculated MW of the peptide chain. However, this MW was within ~5% of the MW calculated by MS.
Ab initio and Rigid-body modelling -DAMMIN (34) ab initio bead modelling was performed against CASPR2 SAXS data using the ATSAS online server. Ten individual modelling runs were performed. The spatial alignment and averaging/bead occupancy corrections were calculated using DAMAVER and DAMFILT.
Rigid-body modelling using CORAL was set up using the SWISS-model derived homology models for each domain of CASPR2 with the addition of bound glycans. The glycan moiety corresponding to Man5GlcNAc2: Man-Man-Man | GlcNAc-GlcNAc-Man | Man were added to Asn residues using the program GLYCOSYLATION that is part of the new ATSAS 2.7.1 software release; Matt Franklin and Maxim Petoukhov, unpublished). The Asn amino acids modified have a combined glycan mass of ≈ 15 kDa and they are reported below. We have used the program GLYCOSYLATION to model each heptasaccharide at each N-linked attachment site (Table 2) as a rigid-body during Coral refinement. GLYCOSYLATION draws from database of glycan structures typically found on glycosylated proteins. CORAL modelling was performed using the glycan-modified CASPR2 domains as rigid bodies, with 15-20 Å distance constraints imposed between the Cand N-termini during refinement. Refinement was performed against the desmeared (i.e., geometrycorrected) SAXS data across the concentration series data for CASPR2. Twenty individual CORAL modelling runs were performed. The final fits against the SAXS data of the Casper models were assessed using CRYSOL, with a background constant correction enabled. In all instances (ab initio and CORAL modelling) discrepancies between the final models and data were evaluated using the traditional reduced  2 test and the probability of similarity obtained from variance/covariance correlation map assessments (35). The SAXS data and resulting models have been deposited to the SASBDB database (36) with the following accession code: SASDBR2.
Homology modeling of CASPR2 -Swiss Model Server was used to obtain structural templates for all of CASPR2 domains. A total of eight sequences (ranging between two and sixty residues) linking various LNS and EGF domains did not have a suitable threedimensional template (Table 1).
EM images for class averages were collected using a Joel 1200 operated at 80kV on Kodak film camera at a magnification of 60,000x. The range of defocus values was 0.5 to 1.1 microns underfocus. Approximately 4,000 particles were analyzed using multivariate statistics, image classification, and averaging. Particles were selected interactively using the WEB display program. A second pass selection of properly centered particles was done interactively, and particles were aligned and classified by reference-based alignment and the K-means classification (51 classes) using the SPIDER suite (38). Biolayer interferometry (BLI) analysis -BLI binding experiments were conducted using a BLItz instrument (ForteBio, Menlo Park, CA) at room temperature. Anti-Human F c Capture (AHC) Biosensors were pre-wetted for 10 min in 300 µL of 10 mM Hepes pH 7.4, 150 mM NaCl, 10 mM CaCl 2, 10 mM MgCl 2 and 5% w/v BSA buffer prior to use. Subsequently, the sensor tips were incubated with conditioned medium of transiently transfected CASPR2 or CNTNs for 10 minutes to capture the expressed protein. The binding reaction occurred in a 4 µL drop containing various concentrations of purified proteins under agitation. Both association and dissociation were allowed to occur for 90 sec. Nonspecific binding and instrument noise were subtracted by using a sensor tip saturated with F c fragment alone.
Cell based binding assay -HEK293 cells were transfected with HA_CASPR2 or FLAG_CNTN1 FL following calcium phosphate protocol in 60 mm plate. After overnight incubation, cells were transferred to 24 well plates on poly-D-lysine-coated glass coverslips (12 mm diameter) at a density of 3.5x10 4 cells/well. After ~6 hours of incubation at 37°C, 1-5 µM of purified CASPR2 or CNTN1 were added to the cells in conditioned media (Dulbecco's Modified Eagle Medium (DMEM), 0.1% w/v bovine serum albumin (BSA), 20 mM HEPES pH 7.4) and the cells were incubated at 4 °C for 16 hours. After two washes with cold DMEM and one with PBS, cells were fixed with cold 4% paraformaldehyde (PFA) for 15 minutes at 4 °C. Cells were washed 3 times with PBS and incubated with antihuman IgG (1:1000, Santa Cruz Biotechnology, Dallas, TX) in blocking solution (2% w/v normal donkey serum, 2% w/v BSA in PBS) for 1.5 hours at room temperature. After three washes with PBS, anti-HA (1:1000, Roche, Basel, Switzerland) or anti-FLAG (1:5000, Sigma, St. Louis, MO) was added to the cells in blocking buffer solution and incubated for 2 hours followed by three washes with PBS and incubation of secondary antibodies conjugated to Alexa Fluor-488 and Cy3 (1:500, Jackson ImmunoResearch, West Grove, PA) for 1 hour. Cover slips were then mounted onto glass microscope slides using Fluoromount G (Southern Biotechnology, Birmingham, AL) and analyzed with a Zeiss LSM 700 confocal microscope with a Cooke SensiCam chargecoupled device (CCD) cooled camera fluorescence imaging system.
Bead Adherence Experiments -Non-fluorescent Streptavidin-labeled 1 μm magnetic beads (Thermo Scientific, aqueous suspension at 10 mg/mL, Waltham, MA) were washed in PBS containing 100 μg/mL BSA (PBS/BSA) and incubated with biotin conjugated antihuman IgG-F c (Jackson ImmunoResearch, West Grove, PA) at 1.2 μg antibody per μl of beads in PBS/BSA at 4°C for 2 hours. The beads were then rinsed with PBS/BSA and further incubated in each of the soluble CASPR2-F c constructs in conditioned DMEM at 4°C overnight. After an additional wash in PBS/BSA, the beads were sprinkled onto hippocampal neuron cultures (1 μl beads/coverslip) left for 24 hours in the CO 2 incubator, and subsequently fixed.
Immunocytochemistry -Neurons at variable DIVs (ranging from DIV 7 to DIV 12) were fixed in 4% v/v paraformaldehyde in PBS for 15 minutes. Cells were then incubated in blocking solution (PBS containing 0.1% v/v Triton X-100, 2% v/v normal sheep serum, and 0.02% w/v sodium azide) for 1 hour. All antibodies used were diluted in blocking solution. Primary antibodies used were: anti-Flag (1:5000, Sigma, St. Louis, MO) and anti-MAP2 (1:5000, Abcam, Toronto, ON). Cells were incubated in primary antibody-containing solution at 4°C overnight, then washed with PBS three times. Secondary antibodies conjugated to Alexa Fluor-488, Cy-3, and 647 were generated in donkey toward the appropriate species (1:500, Jackson ImmunoResearch, West Grove, PA). Cover slips were then mounted onto glass microscope slides using Fluoromount G (Southern Biotechnology, Birmingham, AL). Labeled cells and magnetic beads were visualized by immunofluorescence on a Zeiss LSM 700 confocal microscope with a Cooke SensiCam charge-coupled device (CCD) cooled camera fluorescence imaging system. The conditions for capturing images and the settings for thresholds were kept identical throughout each series of experiments.

RESULTS
Biochemical characterization of the purified extracellular domain of CASPR2 -The construct generating the secreted extracellular domain of CASPR2 (Fig. 1A) extends from the first methionine of the leader peptide to residue S1261 (CASPR2-1261) located immediately before the transmembrane domain. As the secreted CASPR2 protein is expressed using the native leader peptide, we wanted to determine the N-terminal sequence of the mature protein. Five cycles of Edman degradation unambiguously showed that CASPR2 starts at Ala28, consistent with predictions using bioinformatics tools. Because CASPR2 is highly glycosylated, to simplify structural analyses, we expressed it using glycosylation deficient HEK293 GnTI-cells, which only adds a Man5GlcNAc2 (mass = 1234Da) to each N-linked glycosylation site. CASPR2-1261 was subjected to accurate mass determination by MALDI-TOF. Whereas the peptide mass of the expressed protein is calculated to be 138,481Da, mass spectrometry of CASPR2 revealed a molecular mass of 153,229Da. The difference of 14,748Da between the two values is explained by the glycosylation of the protein.
The occupancy of the potential N-linked sites was therefore estimated to be 11.9 units per molecule on average, a value consistent with 12 potential N-linked glycosylation sites. Therefore, these data allowed us to estimate the overall oligosaccharide occupancy of the mature, expressed protein. To ensure sample monodispersity for subsequent experiments, the purified CASPR2-1261 was analyzed by size exclusion chromatography (SEC), yielding a single peak eluting at 12.6mL, corresponding to an apparent molecular weight (MW) of ~140KDa, consistent with the determined MW of 153KDa (Fig. 1B).
Analytical ultracentrifugation indicated that CASPR2 is a monomer in solution -To further characterize oligomerization behavior, mass, and shape distributions of CASPR2-1261, we performed sedimentation velocity (SV) experiments over four loading concentrations from 1.24µM to 10.68µM. The typical half-parabola shape seen for reversible monomerdimer equilibria, and an increase in sedimentation coefficients with an increase in concentration expected for a dimerization are absent (39). Analysis of the SV experiment by the van Holde -Weischet method (25) revealed identical sedimentation coefficient distributions for all concentrations, with >90% of the sample displaying a homogeneous species with a sedimentation coefficient consistent with the monomeric molecular weight, and, in these samples, a small amount (<10%) of aggregate. A small concentration dependent non-ideality effect was evident, which reduced the S-value slightly at higher concentrations (7.76S for 1.24 μM, 7.76S for 2.51 μM, 7.62S for 4.19 μM, and 7.04S for 10.68 μM). Such an effect is not unexpected, especially for glycosylated proteins, where sugar moieties can contribute to crowding and interaction effects. The results for the van Holde-Weischet analysis are summarized in Fig. 1C. To further confirm the molecular weight of the major species we performed a global genetic algorithm -Monte Carlo analysis (27,28) on the three lower concentration samples. This analysis revealed a molecular weight of 150.1 kDa for the major species (95% confidence intervals: 142.8, 157.4 kDa), which is in excellent agreement with the monomer molecular weight determined by Maldi/TOF, and a frictional ratio of 1.39 (95% confidence intervals: 1.35, 1.43), indicating a mostly globular conformation. Taken together, these observations strongly indicate that the extracellular domain of CASPR2 is monomeric in solution and well folded.
Small angle X-ray scattering (SAXS) -SAXS data were measured from samples consisting of the entire extracellular domain of CASPR2 (CASPR2-1261) ( Fig. 1 and Table 3). Guinier plots of the SAXS data is linear (40) ( Fig. 2A and inset) and there is no significant concentration-dependence to the radius of gyration (R g , ~45 Å) or normalized forward scattering intensity at zero angle, I(0) (Fig. 2B-C). Further, there is good agreement between R g values determined by Guinier and P(r) analysis. P(r) analysis also indicates that the maximum linear dimension (d max ) value of CASPR2-1261 is 140 Å, (Fig. 2D), and the slightly extended asymmetry to the right side of the peak is consistent with a relatively compact multi-domain protein. Using lysozyme as a standard (33) and relative I(0) values, the MW of CASPR2 in solution was estimated to be 160 kDa, consistent with the monomeric form of the protein measured by mass spectrometry (153.2 kDa). The slightly higher than the expected (<5%) MW is likely due to the influence of N-linked glycans. Taken together, these solution scattering measurements demonstrate that these protein preparations are free of non-specific aggregation and inter-particle interference effects and thus suitable for more detailed structural analysis.
Ab initio modeling -To obtain an initial assessment of the three-dimensional shape of the extracellular domain of CASPR2, we used an established ab-initio bead-modelling approach implemented in the DAMMIN software package ver. 5.3 (41). With this procedure, we derived an ensemble of molecular shapes that best describe the scattering data. All of the models for CASPR2-1261 had an excellent fit to the data ( 2 : ~0.76-0.77: Correlation map p > 0.01; 35) and the normalized spatial discrepancy values of 0.77-0.90 (42) between the individually calculated shapes, indicated good correspondence between each shape reconstruction. Analyses of the individual bead models (Fig. 2E) revealed structural features in common with models derived from rigid body modelling procedures and from single particle EM (see below) showing that CASPR2-1261 maintained a compact overall architecture.
Rigid body modeling -At present, no high resolution structure of any CASPR2 domain is available. Therefore, using the SWISS MODEL server, homology models were built for each domain, with ~90% overall coverage of the extracellular domain (1107 over 1234 amino acids) ( Table 2). To obtain an atomistic structural model of CASPR2 that best fits the SAXS data the program CORAL (41) was used. This program simultaneously refines the relative orientation and positioning of individual homology modelled domains as rigid-bodies against the SAXS data, while accounting for amino acids not present in the coordinate files, by modeling them as poly-glycine inter-domain linkers. The resulting CORAL models ( 2 : 0.42-1.24: Correlation Map p > 0.01) (Fig. 2E, ribbon representation) spatially superimpose with the individual ab initio bead models and show a similarly compact architecture. Several models were obtained that fit the SAXS data equally well, had a d max of ~140Å, and shared the common features of being overall compact, with a three-clover leaf at one end and two domains at the other end. However, probably due to its more symmetric architecture compared to α-neurexin, SAXS data did not allow us to model the N-or C-terminal ends of the protein always in the same position with respect to the other domains (Fig. 2E). Although the model of the central fibrinogen-like domain is missing eight amino acid residues, its N-and C-terminal ends are spatially proximal, likely forcing the proximal and distal domains of CASPR2 into a compact overall architecture. This is in contrast with the extended shape of α-neurexin that contains a central LNS domain in place of the fibrinogen-like domain (43). Finally, to best approximate the molecular volume/mass contributions of the highly hydrated sugar moieties, Nlinked glycan moieties were added to all the Nx(S/T) sequons available in the homology modelled domains during rigid body modeling (Fig. 3). This model highlights the volume of the Man5GlcNAc2 carbohydrate occupancy of this molecule and suggests that when native glycosylation is added in the Golgi apparatus, an even larger volume occupied by the sugar moieties can affect the overall flexibility of the protein and the interaction with its ligand(s).
Single particle negative staining electron microscopy (EM) -SAXS provides a time and ensemble average representation of the conformation of macromolecule in solution, while single particle EM can directly provide information regarding individual structures. To evaluate potential structural variability in the extracellular domain of CASPR2, the purified protein was fractionated by size exclusion chromatography and the fraction corresponding to the center of the peak was imaged using a transmission EM ( Figure 1B). CASPR2 particles were monodisperse and homogeneous in size and all shared a compact overall architecture, in good agreement with all other experiments presented thus far. Raw single particle EM imaging of the entire extracellular domain of CASPR2 showed that individual particles adopted an assortment of conformations illustrated by a variety of shapes ( Fig.  4A-B). After aligning and averaging ~4,000 particles and classifying them into 51 classes, the majority clearly showed the presence of five distinguishable modules, which, judging from their size (~40Å diameter), corresponded to the discoidin and the four LNS domains (Fig. 4B, E).
The central domain between residues ~592 and ~798 that contains a small fibrinogen-like sequence of unknown structure could not be identified. As for the two EGF domains (~35 amino acids each) their small size did not allow us to specifically identify them with this technique, as shown in similar experiments used to determine the α-neurexin structure (43). Although these images do not allow us to readily distinguish between conformational diversity and different positioning of the protein on the grid, numerous class average images of the extracellular domain of CASPR2 clearly showed features similar to the ab initio and rigid body modeling (Fig. 4C-D). In particular, we could distinguish the clover leaf arrangement at one end and two domains at the other end, suggesting that the three domains constitute the N-terminal region of the protein whereas the two-domain region likely represents the third and fourth LNS domains. In these experiments, the maximum dimension of the particles remains highly consistent with the SAXS measurements (~145Å). Taken together, all biophysical experiments shown here, including SEC, AUC, SAXS, and single particle EM illustrated that the extracellular domain of CASPR2 has an overall compact architecture. As all these data come from independent biophysical techniques, they strongly suggest that the observed three-dimensional organization is relevant in vivo.
CNTN1 is a novel ligand for CASPR2 -While testing the binding of CASPR2 to the contactin family of proteins (which includes CASPR2's putative ligand TAG-1), we found that contactin 1 (CNTN1) specifically associates with the extracellular domain of CASPR2. Using bio-layer interferometry (BLI) we immobilized CNTN1-993-F c on an anti-human F c sensor and used the purified extracellular domain of CASPR2-1261 to directly test the association (Fig. 5A). The binding between the extracellular domains of CASPR2 and CNTN1 displayed fast association and dissociation rates, with a calculated dissociation rate constant, K D , of ~14µM. When the phases were reversed by immobilizing CASPR2-F c to the sensor tip and using purified CNTN1-993 (Fig. 5B), similar results were obtained.
To further investigate their association and determine whether CNTN1 is able to bind CASPR2 directly on the cell surface of live cells we performed a cell based binding assay. HEK293 cells expressing the full length CASPR2 HA-tagged were incubated with up to 5µM of purified CNTN1-993-F c and we found that the extracellular domains of CNTN1 bound to CASPR2 expressed on the cell surface. As a positive control, we incubated HEK293 cells expressing Neuroligin 1 with 1µM of purified β-neurexin-F c (20,44) or 5µM of the F c portion alone as a negative control (Fig. 5C).
CASPR2 interacts with multiple domains of CNTN1 -Having confirmed the association of CASPR2 and CNTN1 by two independent techniques, and measured the apparent affinity, we sought to determine which domain of either protein is responsible for the interaction. We engineered various deletion constructs for both CASPR2 and CNTN1 (See Table 1 for constructs' boundaries), expressed them in HEK293 cells and used BLI to ascertain whether they retained their binding properties. Using this strategy we narrowed down the associating domains for CNTN1 as the central Ig5/FN1 construct (Fig. 6A, C, D). The shorter construct Ig6/FN1 did not show measurable association, as well as either half (the six Ig domains or the four FN3 domains) of the protein (Fig. 6A, C). Interestingly, for CASPR2, the construct encoding D1-6 was the shortest construct that showed CNTN1 association. Although we tested smaller fragments such as D1, D2-3, D3-6, D6-8, the lack of binding of these fragments suggests that the minimal binding domain of CASPR2 is composed by the first six domains together (Fig. 6B-C, D).
To confirm these results, we resorted again to the cell-based binding assay using WT CASPR2 expressed on the surface of HEK293 cells and soluble CNTN1-Ig5/Fn1. We also reversed the experiment and used WT CNTN1 and soluble CASPR2-D1-6. In both cases association was confirmed and the F c alone was used as a negative control (Fig. 6E).
CNTN1 is an endogenous receptor for CASPR2 in hippocampal neurons -We next wanted to determine whether CNTN1 is a receptor for CASPR2 in cultured primary neurons. CASPR2-1261-F c fragments were loaded onto microbeads through the F c portion (or the F c alone as a control) and subsequently incubated with hippocampal neurons after 14 days in vitro (DIV14). After washing and fixing the plated neurons, only the CASPR2 fragments that interacted with a ligand exposed on the surface of the neurons remained bound and could be visualized using microtubule associated protein-2 (MAP2) as a marker for the neurons (Fig. 7A), whereas in the absence of a ligand (e.g. beads loaded with the F c control), beads were removed with the washes (Fig. 7B). To ascertain whether CNTN1 was a ligand when expressed on the surface of neurons and glial cells, we transfected WT CNTN1 in hippocampal neurons and repeated the binding experiments using CASPR2 1261 or the deletion constructs CASPR2-D3-6 as a negative control (Fig. 7C-D). Under these conditions, beads loaded with CASPR2 1261 were clearly co-localizing with the transfected CNTN1 neurons whereas beads loaded with CASPR2-D3-6 were removed with the washes similarly to the F c alone control condition. Because only the domains positive for the CNTN1 association in vitro were bound to the neurons in culture, this experiment suggests that CNTN1 is one endogenous ligand for CASPR2.
Binding of CASPR2 with the other CNTNs -To test whether CASPR2 binds to other members of the CNTN family, we expressed and tested the binding of the extracellular domain of CASPR2 with CNTN1 through CNTN6 using BLI. Despite a 40-49% amino acid identity among the CNTN isoforms, we did not detect any association with any other CNTN family member, including its putative ligand TAG-1 (CNTN2) (Fig. 8A-B) (2-4). Although the binding experiments with TAG-1 were replicated in various buffer conditions (e.g. with or without Ca(+2) and Mg(+2) and with different batches of proteins, we never detected any interaction.

DISCUSSION
Because of the importance of CASPR2 in human brain development (45) and to complement our previous biochemical work (7) we sought to understand the structure of CASPR2 and the molecular aspects of CASPR2's interaction with its new ligand CNTN1. Using a combination of independent biophysical and biochemical techniques we report here major new findings on the overall architecture of the extracellular domain of CASPR2 and its association with CNTN1. First, we describe here the first, to our knowledge, structural models of the extracellular domain of CASPR2: despite similarities in its domain composition and organization with αNRXN, our data indicate that the two proteins have distinct tertiary structures, as the extracellular domain of CASPR2 is compact with a likely dominant three-domain clover leaf feature with two domains somewhat extended. In contrast, the extracellular domain of αNRXN displays a significantly more elongated, "L-shaped" conformation (43,(46)(47). Second, our single particle EM data suggests that within an overall compact architecture the extracellular domain of CASPR2 may adopt a variety of tertiary arrangements, although a major proportion of the images processed are consistent with the arrangement that best fits the solution scattering data. Third, by expressing the proteins in HEK293 GnTI-cells and by using mass spectrometry and SAXS, we were able to model most of the glycan structures present on the extracellular domain of CASPR2, and demonstrate that they constitute a significant portion of the protein. Fourth, we show here for the first time that CNTN1 is an endogenous ligand that binds CASPR2 with µM affinity. Surprisingly, under the same experimental conditions none of the other members of the CNTN family, including the putative ligand TAG-1, directly interact with CASPR2.
Structural characterization -Despite the sequence similarity with αNRXN, we present AUC, single-particle EM, and SAXS data that unambiguously reveal that the extracellular domain of CASPR2 is monomeric and more compact in shape than αNRXN with a maximum dimension of ~140Å (vs ~170Å for αNRXN), as consistently showed by SAXS rigid body modeling. In particular, because the central fibrinogenlike sub-domain is unique to CASPR2, we speculate that this sub-domain may be in part responsible for the compact three-dimensional shape of the extracellular domain of CASPR2. Although the identity of the subdomains cannot be determined by SAXS, EM classaveraged images show a group of two subdomains likely composed of the last two LNS subdomains and a group of three sub-domains, likely composed by the discoidin and the first two LNS sub-domains. The fact that the data from independent solution scattering, AUC, and single particle EM techniques agree strongly and overlay well, indicates that these conformations are native, and likely drive CASPR2's biological function. Finally this study directly confirms the monomeric nature of the autism mutation CASPR2-1253* (5,6) that we recently described (7). In this published work, comparative SEC analysis of CASPR2-1253* and CASPR2-1261 truncation construct showed that these two proteins had virtually identical elution volumes, thus indicating equivalent overall shape and oligomerization state.
Single particle EM suggests that the extracellular domain of CASPR2 may exhibit some polymorphism in the arrangement of the individual subdomains. Such flexibility is common to many proteins with a modular architecture, and may well be functionally important as it allows the various domains to sample the three dimensional volume and structurally adapt to the engagement with multiple binding partners and possibly to signal through the cell membrane. As discussed in more details below, the flexibility of the extracellular domain of CASPR2 may be relevant to the µM affinity measured for CNTN1.
Unlike protein crystallography, where extensive glycosylation usually constitutes a barrier for crystallogenesis due to its hydration and flexibility, SAXS experiments are not restricted by the glycosylation degree of protein samples and the contribution of carbohydrate moieties to their overall structure can be modeled. In this work, both ab initio and rigid body modeling procedures take into account the structural and volumetric contribution of the Nlinked glycosylation in the definition of the final structures ( Fig. 2 and Fig. 3). In the analysis of these structural models we highlight the total amount of glycans because they constitute a large fraction of the mass of the protein and they likely influence the expression, folding, and solubility of CASPR2. In our models, we added to our sub-domains Man5GlcNAc2 moieties because the protein used for structural determinations was produced by GnTI-cells. However, neurons probably add larger and more complex type of glycans and therefore the relative mass contributed by the native N-linked glycosylation is actually much larger than the one shown in this study. Larger glycosylation will also have an important impact on the architecture (e.g. flexibility) and ligand recognition of CASPR2.
CNTN1 is a novel ligand for CASPR2 -In testing the binding affinity of CASPR2 for the CNTN family, which includes the putative ligand TAG-1 (CNTN2), we found that only CNTN1 binds to the extracellular domain of CASPR2, and it does so with low affinity (dissociation constant in the µM range). This type of affinity is typical of the extracellular interactome (48) whereas nM affinities are less common in this class of molecules. The fact that for CNTN1 we could not measure any association using the six Ig or the four Fn3 domains alone (CNTN1-Ig1-6 or CNTN1-Fn1-4) suggests that the minimal binding domain requires a combination of Ig and FN3 domains. Using several deletion constructs of both CASPR2 and CNTN1 enabled us to identify CNTN1 Ig-5/Fn1 and CASPR2 D1-6 as minimal binding cassette. However, the requirement for multiple domains is not unusual, especially for proteins containing Ig and FN3 domains (49). Our data indicate that the recombinant, purified extracellular domains of CASPR2, CNTN1 and TAG-1 are correctly folded because they appear nonaggregating (monodisperse) and without degradation in SEC experiments and SDS-PAGE, and that their expression levels were comparable to well folded proteins with which we have worked in the past (18,43). Whereas we cannot exclude that CASPR2 specifically binds to other receptors in neurons, the use of deletion constructs in the neuronal experiments suggest that CNTN1 is an important endogenous ligand for CASPR2. Moreover, because of the positive binding, these experiments show that CASPR2 constructs are well folded. Remarkably, under the same experimental conditions TAG-1 (CNTN2) does not seem to associate with CASPR2. Because the affinity of the CASPR2/TAG-1 pair is currently unknown, as the binding was detected with non-quantitative techniques (e.g. immunoprecipitation) (2)(3)(4), one possibility is that the TAG-1/CASPR2 interaction is significantly weaker than the interaction with CNTN1, and therefore we are not able to detecting it by BLI. Another possibility is that the complex associates through the interaction with a third protein that is currently unknown.
Although many open questions on the in vivo functions of CASPR2 remain, our structural models, the observed interaction with CNTN1, the extensive glycosylation, and the conformational heterogeneity, offer new insights into the structure-function relationship of these two neuronal proteins. Moreover, because CNTN1 is expressed by both neurons and glial cells, the discovery of CNTN1 as a new CASPR2 ligand suggests a complex role of CASPR2 in the nervous system.  . Both parameters remain linear as expected by optimal protein preparation with virtually no non-ideal effects. D) The P(r) functions of CASPR2-1261, indicates the distribution of all interatomic distances and the maximum dimension of the particle in Å. Statistical quality of the data can be assessed by the standard error bars; some estimated errors are smaller than the symbols. See also Table 3 for a complete report of other SAXS parameters. E -Overlay of six distinct DAMMIN models obtained by ab initio reconstructions (white bead representation) with six models obtained with rigid body modeling (ribbon representation) to highlight the similarity between the model pairs. N and C, N-and C-termini of the protein.  figure 2E) in white ribbon representation. The model is shown in two orientations to highlight the carbohydrate contribution (Blue spheres, ref. 50) to the total mass of the protein and the few residues missing in the modeling procedures (Orange spheres). Red numbers refer to the individual domains as described in Figure 1A. N and C, N-and C-termini of the protein. The size of each box of the class averages is 32x32 nm. C and D -Overlay of one EM class average to a ab initio reconstruction (C) or one of the rigid body models obtained with Coral (D) to show the high similarity of shapes and models obtained with these independent procedures. In this panel, we have labelled the domains resulting from the rigid body modeling. Scale bar is 7.5 nm. E) Additional 51 single particle EM classes show the breath of the conformational variability and orientation of the protein on the grid of the extracellular domain of CASPR2. FIGURE 5. CASPR2 and CNTN1 association experiments -A) BLI experiment of the association between the extracellular domain of CNTN1-993 and the purified CASPR2-1261. Five concentrations of CASPR2 (75 µM to 4.7 µM in two fold dilution) were used to determine the affinity of the association, calculated in the inset. The global association and dissociation rate constants obtained from these curves are: k a =1.461 X 10 4 M -1 s -1 and k d =3.29 X 10 -1 s -1 . B) SEC profile of purified CNTN1 used for BLI experiments. The elution volume of this protein indicates that, similarly to TAG-1, CNTN1 is likely a dimer in solution. C) Cell based binding assay between the HA-CASPR2 full length transfect in HEK293 cells and the purified CNTN1-993 added at 3 µM. F c and HA-CASPR2 was used as a negative control whereas NLGN1/β-NRXN1 were used as a positive control. Hoechst stain was used to visualize all cells. Scale bar = 10 µm. FIGURE 6. Determination of the associating domains between CASPR2 and CNTN1 -A) BLI curves of purified CASPR2 (30 µM) bound to various fragments of CNTN1-993 or shorter constructs immobilized on a sensor tip, highlighting the associating constructs. B) BLI curves of purified CNTN1-993 (30 µM) with immobilized various fragments of CASPR2-1261 or shorter constructs highlighting the associating constructs. C) Diagram of the various CASPR2 and CNTN1 deletion constructs that were tested by BLI and their binding result. Dotted lines, no detectable binding; solid lines, associating fragments. D) Western blot of the deletion constructs used to determine the minimal binding cassette by BLI. E) Cell based binding assay between the extracellular domain of either HA-CASPR2 full length or FLAG-CNTN1 full length with the purified fragments at 3 µM that still retain binding by BLI. The F c fragment alone was used as negative control. Hoechst stain was used to visualize all cells. Scale bar = 10µm. FIGURE 7. Hippocampal neurons in culture experiment suggesting an interaction between CASPR2 and endogenous CNTN1 -A) One µm magnetic beads were loaded with CASPR2-1261 and incubated with hippocampal neurons before being fixed and stained. Note that the beads remain attached on the neuronal branches. B) F c alone beads used as control were incubated with neurons in identical wells. Because of the lack of association, most of the beads have been lost through the washes. C) When neurons are transfected with FLAG-CNTN1, CASPR2-1261-loaded beads bind strongly to the cultured neuron. D) Similarly to control beads, beads loaded with the non-interacting deletion construct CASPR2-D3-6 did not bind to the neurons or the transfected CNTN1. Scale bar = 10 µm. MAP2 was used as to outline the neurons.