Crystal Structure of a Bony Fish β2-Microglobulin

Three-dimensional structures of β2-microglobulin (β2m) from chicken and various mammals have been described previously, but aside from genomic sequences, very little is known about the three-dimensional structures of β2m in species other than warm-blooded vertebrates. Here, we present the first three-dimensional structure of β2m from bony fish grass carp (Ctid-β2m), resolved at 2.1 Å. The key structural differences between this new structure and previously published structures are two new hydrogen bonds at positions Ile37 and Glu38 in strand C and Lys66 in strand E, and a hydrophobic pocket around the center of the protein found in Ctid-β2m. Importantly, Ctid-β2m has a short D strand and a long loop between stands C and D, rather than the flexible region found in other β2m structures that serves as a putative binding region for the major histocompatibility complex heavy chain. Comparing the Ctid-β2m structure with those of bovine and human β2ms, the Cα root mean square deviation of the latter are 1.3 Å and 1.8 Å, respectively. Compared with the constant domains of Lamprey T cell receptor-like receptor (Lamp-TCRLC) and Amphioxus V and C domain-bearing protein (Amphi-VCPC), Ctid-β2m exhibits very different topology. The three-dimensional structures of domains predicted from Amphi-VCPC/Lamp-TCRLC are distinctly lacking in strand A of β2ms. There are 18 amino acids at the N terminus of Amphi-VCPC that may have evolved into strand A of β2ms. A mutation in the BC loops of Amphi-VCPC may have led to the novel topology found in β2m. Based on these results, Ctid-β2m may well reflect evolutionary characteristics of ancestral C set molecules.

is classified as part of the set of immunoglobulin superfamily (IgSF) constant (C) proteins based on its amino acid sequence, number of strands, and folding topology (1,2). Many protein molecules possessing the IgSF C domain exist in numerous organisms, but only two forms have been discovered in the immune system. One form is composed of IgSF domains, comprising the B cell receptor, the major histocompatibility complex (MHC) and the T cell receptor (TCR), in which IgSF C is combined with the IgSF variable (V) domain (3). The other form consists of independent molecules, such as the ␤2m molecules (4,5). Both forms, including dependent and independent molecules, play important roles in the adaptive immune system. ␤2m is a functional molecule that noncovalently associates with MHC class I molecules to stabilize their heavy chain three-dimensional structure, which is required to bind foreign antigen peptides and to facilitate T helper lymphocyte, cytotoxic T lymphocyte, and natural killer cells during the immune response (6 -8). ␤2m genes are conserved across species, including humans, mammals, and birds. The genomic structure of these genes contains four exons and three introns. The ␤2m genes are genetically unlinked and are located outside of the MHC region (9).
The three-dimensional structure of bovine ␤2m was first solved by Becker et al. (5). To date, the structures of ␤2m in humans, mice, and chickens have been determined either in their monomer form or as part of MHC I complexes (4,5,10,11). The known ␤2m structures are composed of 99 mature residues with a seven-stranded ␤-sandwich fold and thus belong to the typical IgSF C1 set of molecules. In the reported three-dimensional structures of ␤2m, strands A, B, and E comprise one ␤-sheet; strands C, F, and G form the second ␤-sheet; and the D strand runs between the layers. A central disulfide bond, bridging Cys 25 and Cys 80 on the B and F stands, respectively, contributes to protein stability (14). The D strand is divided by a two-residue ␤-bulge into two short two-residue ␤-strands (D1 and D2). This ␤-bulge has been found in all solved ␤2m structures and is considered a common feature of ␤-sheet proteins. The loss of the ␤-bulge, together with changes in the positions of three residues in the CD loop linking the C and D strands, results in the formation of a new continuous six-residue ␤-sheet. Structural rearrangements of the D strand by a possible edge-to-edge mechanism have been implicated in human ␤2m amyloidosis (12). Interestingly, the IgSF C domains are not only highly conserved in B cell receptor, TCR, and MHC class I and II but also appear in Fc␥R I, CD2, CD4, and CD8; the C domain has even been found in lower animals, such as in the lamprey TCR-like receptor (Lamp-TCRLC) and in the Amphioxus V and C domain-bearing protein (Amphi-VCPC) (13,14). However, it generally has been believed that the ␤2ms emerged suddenly in the antigen presentation system of the adaptive immune system and are present only in jawed vertebrates. Very little is known about the three-dimensional structure of ␤2m in the early vertebrate bony fishes, and, therefore, its evolutionary origin cannot be explained adequately.
Fish ␤2m genes were first reported by Ono and co-workers (15) and Dixon et al. in 1993 (16). To date, ␤2m genes have been reported in 16 fish species, including grass carp (17), zebrafish (15), carp and tilapia (16), trout (9), channel catfish (18), Atlantic salmon (19), and sturgeon (20). Research has demonstrated that the 116 amino acids encoded by the fish ␤2m genes (except for in sturgeon) are three residues shorter than those of humans, mammals, and chickens. Fish ␤2m genes have a number of unique characteristics that are not shared by their mammalian homologues, such as a deletion of two amino acids in the mature protein and a single N-linked glycosyla-tion site in grass carp (17) and catfish (18). In addition, as first discovered in rainbow trout, the fish ␤2m locus consists of three linked genes: two similar expressed genes and one gene that is incomplete and not expressed (9).
We have reported previously a representative bony fish grass carp ␤2m (Ctid-␤2m) gene (17). Here, we resolved the first three-dimensional structure of Ctid-␤2m at 2.1 Å. Ctid-␤2m closely resembles the previously described human and bovine monomeric ␤2m structures, with some significant variations. Two new hydrogen bonds and a hydrophobic pocket were found in Ctid-␤2m. In particular, the structure of Ctid-␤2m revealed an unusually flexible conformation in the region of the CD loop. This ␤2m structure highlights the evolutionary propensity toward stability with the existence of an unusually flexible CD loop in bony fish. Because fish ␤2m is evidently the evolutionary turning point of the IgSF C set of molecules, the three-dimensional structures of Lamp-TCRLC and Amphi-VCPC were predicted, and their topologies were compared with that of Ctid-␤2m. Based on the structures presented here, Ctid-␤2m may well reflect evolutionary characteristics tracing back to ancestral C-set molecules.

EXPERIMENTAL PROCEDURES
Protein Cloning, Expression, and Purification-The Ctid-␤2m gene was amplified from the plasmid p2X-Ctid-␤2m, which was constructed previously by our research group using PCR (21). This plasmid contains a unique NdeI restriction site, a stop codon and a unique XhoI restriction site (21,22). The products were ligated into a pET21a vector (Novagen) and transformed into Escherichia coli strain BL21 (DE3) for protein expression. The recombinant proteins were expressed as inclusion bodies, which were then lysed using a sonicator and centrifuged at 2,000 ϫ g. The pellet was washed three times with a solution containing Triton X-100 (20 mM Tris-HCl, 100 mM NaCl, 1 mM EDTA, and 1 mM dithiothreitol) and then once with the same solution without Triton X-100. The inclusion body was dissolved overnight in urea buffer (8 M urea, 50 mM Tris-HCl pH 8.0, 100 mM NaCl, 10 mM EDTA, 10% (v/v) glycerine, 10 mM dithiothreitol) using ϳ1 ml of urea buffer per 30 mg of protein. The Ctid-␤2m was refolded by the gradual dilution method using the following refolding buffer: 100 mM Tris-HCl, 2 mM EDTA, 400 mM L-arginine-HCl, 0.5 mM oxidized glutathione, 5 mM reduced glutathione, 0.1 mM phenylmethylsulfonyl fluoride, and 0.1 mM NaN 3 , pH 8.0. After stirring for 48 h at 4°C, the remaining soluble proteins were concentrated and purified using a Superdex G-200 (Amersham Biosciences) sizeexclusion column followed by Resource-Q (Amersham Biosciences) ion-exchange chromatography (21,23).
Crystallization, Data Collection, and Processing-The purified Ctid-␤2m was adjusted to a concentration of 10 mg/ml with crystallization buffer (10 mM Tris-HCl, 10 mM NaCl). An initial crystallization trial was set up with Crystal Screens I and II (Hampton Research) at 18°C using the hanging drop method. The drop, containing equal volumes (1 l each) of protein solution and reservoir crystallization buffer, was placed over a well containing 200 l of reservoir solution using VDX plates. Crystals suitable for data collection were grown in 3-5 days under optimized conditions using 0.1 M MES pH 6.5, 12% polyethylene glycol 20000, 3% (v/v) ethanol, and a protein concentration of 5 mg/ml. For data collection, the crystals were soaked for several minutes in reservoir solution supplemented with 20% glycerol as a cryoprotectant and then flash cooled directly in liquid nitrogen. X-ray diffraction data were collected to 2.1 Å resolution on a Rigaku MicroMax007 rotatinganode x-ray generator operated at 40 kV and 20 mA (CuK ␣ ; ϭ 1.5418 Å) equipped with an R-AXIS VII 2ϩ image plate detector. The data were processed and scaled using DENZO and SCALEPACK as implemented in HKL-2000 (24). The data collection statistics of the Ctid-␤2m crystals are shown in Table 1.
Structure Solution, Refinement, and Analysis-The crystal structure of Ctid-␤2m was solved by molecular replacement using human ␤2m (PDB code 1LDS) as a search model for the CNS program (24,25). Residues that differed between Ctid-␤2m and the search model were manually rebuilt in the O program (26) under the guidance of F o Ϫ F c and 2 F o Ϫ F c electron density maps. After refinement of the model with the CNS program using stimulated annealing, energy minimization, restrained individual B factors, and the addition of 196 water molecules, the R work and R free dropped to 19.3 and 22.34%, respectively for all data between 35 and 2.1 Å. The course of refinement was monitored by calculating R free based on a subset containing 3% of the total number of unique reflections. The coordinate error estimated by the Luzzati plot in CNS (21,22) for the Ctid-␤2m structure is 0.41 Å. The average real-space fit value for Ctid-␤2m, as calculated by O (26), is 0.95. Model geometries were verified using the PROCHECK program (27).
Homology Modeling of the IgSF C Domains in Jawless Vertebrates and Protochordates-Although fusion molecules, including the IgSF C domain, exist widely in the adaptive immune system, only two proteins, Amphi-VCPC and Lamp-TCRLC, were predicted to possess the IgSF C domain in jawless vertebrates and protochordates (13,14). Because the three-di-FIGURE 2. Analysis of Ctid-␤2m with ␤2ms from fish, chicken, and mammals. A, sequence alignment of Ctid-␤2m with ␤2ms from fish and mammals. The sources of the sequences are as follows: Ctid-␤2m (GenBank TM accession no. AB190815, PDB code 3GBL), zebrafish (GenBank TM accession no. L05383), trout (GenBank TM accession no. L49056), catfish (GenBank TM accession no. AF016042), salmon (GenBank TM accession no. AF180488), chicken (GenBank TM accession no. M84767), bovine (GenBank TM accession no. NM_173893), mouse (GenBank TM accession no. NM_009734), and human (GenBank TM accession no. NM_004048). Numbers over the alignment denote residues that form Ctid-␤2m. Black arrows above the alignment indicate ␤-strand. T, toil. Residues highlighted in red are absolutely conserved, whereas those with blue squares are highly conserved (80%). Green numbers denote residues that form disulfide bonds. The alignment was generated using the program Clustal X and drawn with ESPript. B, structural superpositions of Ctid-␤2m with the ␤2m monomer from chickens. Ctid-␤2m and chicken ␤2m are colored in orange and green, respectively. C, structural superpositions of Ctid-␤2m with ␤2m monomers from humans and bovines. Ctid-␤2m, human, and bovine ␤2ms are colored in orange, green, and red, respectively. The superposition was created by PyMOL, using the C atoms of the globular segment. Ctid-␤2m (PDB code 3GBL), chicken (PDB code 3BEW), HLA ␤2m (PDB code 1LDS), and bovine (PDB code 1BMG). mensional structures of Amphi-VCPC and Lamp-TCRLC are not yet available, but are necessary to determine the evolutionary origin of the IgSF C set, the three-dimensional structures of Amphi-VCPC and Lamp-TCRLC were predicted by amino acid homology modeling using the SWISS-MODEL server based on the existing three-dimensional structures of ligand binding protein (Lingo-1) (PDB code 2ID5) and T cell surface glycoprotein CD4 (PDB code 2NY4) in the Protein Data Bank. DNA-MAN was used to analyze the differences among these molecules and the PyMOL Molecular Graphics System (DeLano Scientific) was used for figure preparation.

RESULTS
Overall Structure of Ctid-␤2m-The mature Ctid-␤2m contains 97 amino acids (compared with 99 for human and mice). As expected, the C␣ root mean square deviation of Ctid-␤2m (PDB code 3GBL) is 1.3 and 1.8 Å, similar to values for the structures of monomer bovine ␤2m (PDB code 1BMG) and human ␤2m (PDB code 1LDS). The chains are folded into a typical "␤-barrel" configuration dominated by two antiparallel pleated sheets; one sheet is composed of four strands, and the other sheet is composed of three strands (Fig. 1A). The Ctid-␤2m structure is composed of two face-to-face ␤-sheets of different sizes: the "large ␤-sheet" is composed of strands A (6 -11 aa), B (21-28 aa), D (55-57 aa), and E (60 -70 aa), and the "small ␤-sheet" is formed by strands C (36 -41 aa), F (78 -84 aa), and G (87-92 aa). Six loop regions (AB, BC, CD, CE, EF, and FG) connect these strands. The two ␤-sheets are linked by a Cys 25 -Cys 80 disulfide bridge, which is highly conserved in all known ␤2ms and provides strong geometric constraints on the surrounding residues. A cluster of hydrophobic residues (Ile 7 , Ile 9 , and Tyr 10 on strand A; Leu 23 , Ile 24 , Ytr 26 , and Val 27 on strand B; Ile 37 , Leu 39 , and Leu 40 on strand C; Phe 54 on strand D; Trp 69 , Leu 64 , Thr 65 , Val 68 , and Ple 70 on strand E; Typ 78 and Val 82 on strand F; and Thr 88 and Val 92 on strand G) that are likely to form a strong hydrophobic pocket is found on both the large and small ␤-sheets (Fig. 1B).
Comparison of the Three-dimensional Structures of ␤2ms in Vertebrates-An alignment of Ctid-␤2m with zebrafish, trout, catfish, salmon, chicken, bovine, mouse, and human ␤2ms revealed 37 identical amino acids (38% homology). Without Ctid-␤2m, 48 amino acids are identical among an alignment of the other ␤2ms (48% homology) ( Fig. 2A). In Ctid-␤2m, there are 20 amino acid residues that are distinct from the other homologues. These residues are randomly distributed throughout the ␤-sheets and loops. Previous studies have shown that most ␤2ms in teleosts are mature proteins composed of 97 amino acids. They have two amino acid deletions at positions 91 and 92, in which they differ from mammalian ␤2ms (9,17). Compared with human ␤2m, it has been indicated that bovine monomeric ␤2m has an additional deletion at position 48 (28). The function of the amino acid deletion is as yet unknown. The strands of Ctid-␤2m are slightly different compared with previously described ␤2ms. Strand B spans residues 21-28, which is the same as in mouse ␤2m, whereas this strand in human ␤2m is composed of residues 21-30. Strand G is longer than in other described ␤2m structures. Because of the two amino acids deleted in Ctid-␤2m at positions 91 and 92, loop FG is shorter than its human ␤2m counterpart (Fig. 2, B and C). Strand E is two amino acids longer than in human ␤2m and corresponds to residues 60 -70 as opposed to 62-70 in human ␤2m. Compared with bovine and human ␤2ms, the total C␣ root mean square deviation values of Ctid-␤2m are 1.3 and 1.8Å, respectively, using the program DALI ( Table 1). The main difference between Ctid-␤2m and human ␤2m is found around the loops from His 11 to Asn 20 and from His 83 to Lys 86 , which correspond to Arg 13 to Phe 23 and Asn 84 to Lys 92 in human ␤2m (PDB code 1LDS). Compared with bovine ␤2m, the loops in Ctid-␤2m are different in sequence, from residue Glu 56 to Trp 59 and from His 83 to Lys 86 , which correlate to Ser 56 to Ser 60 and Lys 82 to Arg 90 in bovine ␤2m (Fig. 2). Altogether, these data indicate small changes in ␤-sheet composition among different species.
Two Additional Hydrogen Bonds Found in Ctid-␤2m-A total of ϳ60 intramolecular hydrogen bonds (mostly main chain to main chain) stabilize the folded human ␤2m (28). The hydrogen bonds of Ctid-␤2m between strands D and E are conserved, as compared with human ␤2m. Both human ␤2m and Ctid-␤2m form hydrogen bonds. Ctid-␤2m has hydrogen bonds between Gln 50 -Ser 67 , Thr 52 -Thr 65 , and Ala 55 -His 63 , whereas the corresponding bonds of human ␤2m are Glu 50 -Tyr 67 , Ser 52 -Leu 65 , and Ser 55 -Tyr 63 . Asp 53 is highly conserved in all ␤2ms and forms significant hydrogen bonds to the heavy chain in the human HLA-␤2m complex structure. In human ␤2m, Asp 53 is located in the ␤-bulge and forms hydrogen bonds to Gln 32 , Arg 35 , and Arg 48 in the ␣1 domain of the HLA heavy chain, whereas the counterpart residues in the grass carp heavy chain are Gln 31 , Tyr 34 , and Lys 45 . These three residues (Gln 32 , Arg 35 , and Arg 48 ) are also highly conserved in Ctid-␤2m. However, two new hydrogen bonds in Ctid-␤2m appear between the directly adjacent C and E strands (Fig.  3A). Two hydrogen bonds are formed at positions Ile 37 and Glu 38 in strand C to integrate with Lys 66 in strand E. A hydrogen atom from the ⑀-amino group of Lys and an oxygen atom provided by the carboxyl group of Ile formed one hydrogen bond. Another hydrogen bond was formed by the above mentioned hydrogen atom and an oxygen atom from

Statistics for data and refinement of Ctid-␤2m
Values in parentheses are given for the highest resolution shell. R free is calculated over reflections in a test set (5%) not included in atomic refinement. r.m.s., root mean square.

Data statistics
No. of reflections (unique/total) 6 the ␥-carboxyl group of Glu. The lengths of these hydrogen bonds are 2.75 and 3.54 Å, respectively, whereas the distance between the two strands is 7.65 Å. Lys 66 is a fish species-specific residue not present in mammalian and chicken ␤2ms, whereas Ile 37 is highly conserved (Fig. 3B). The disulfide bond formed by residues Cys 25 and Cys 80 occurs between strands B and F and is highly conserved among all ␤2ms, including Ctid-␤2m. The two new hydrogen bonds between strands C and D, which have not been found in any other resolved ␤2m structures, and the newly described hydrophobic pocket indicate a stable interaction between strands C and D in Ctid-␤2m. The Unusual Flexible CD Loop in Ctid-␤2m-The most structurally important part of the ␤2ms is strand D, which binds to the surface of the MHC heavy chain (6,23). Residues 50 -56 (D1-D2 strands) interact with the heavy chain and are part of a region formed by two small strands. Residues Glu 50 -His 51 comprise D1 and Ser 55 -Phe 56 comprise D2. Between the D1-D2 strands, a noticeable ␤-bulge is formed by Asp 53 -Leu 54 in the human structure (6). The ␤-bulge is found in all ␤2m structures described to date, as well as in monomeric bovine ␤2m (5). However, strand D in Ctid-␤2m starts at positions 55-57, and only a short ␤-sheet was observed. Ctid-␤2m is different from human monomeric ␤2m in that a longer ␤-sheet begins at positions 51-56. Much attention has been paid to the short D strand region, which can be involved in contacts with the heavy chain. A rather flexible domain from the CD loop to the D strand was found in Ctid-␤2m (Fig. 4). Thus, Ctid-␤2m appears to form a relatively unusual flexible region, which contributes greatly to the instability of its MHC class I complex.
Comparison of Ctid-␤2m with Lamp-TCRLC and Amphi-VCPC-In jawless vertebrates, Lamp-TCRLC possessing the IgSF C domain was found to have 20.7% identity with Ctid-␤2m at the amino acid level (13). To determine the evolutionary origin of the IgSF C set, the three-dimensional structure of Lamp-TCRLC was predicted using amino acid homology modeling. Using the first approach mode in SWISS-MODEL, the three-dimensional structure of the Lingo-1 molecule (PDB code 2ID5; 29) was found to have 24.68% identity with the Lamp-TCRLC at the amino acid level. The Expect-value of the constructed Lamp-TCRLC three-dimensional structure is 2.20e Ϫ8 . The three-dimensional structure of Lamp-TCRLC is composed of 76 residues (amino acids 19 -94) that form a sixstranded ␤-sandwich fold (Fig. 5A), but it is not a typical IgSF C molecule. In the three-dimensional structure of Lamp-TCRLC, strands A (20 -24 aa) and C (56 -59 aa) comprise one ␤-sheet, whereas strands B (34 -38 aa), E (73-81 aa), and F (84 -92 aa) form the second ␤-sheet, and strand D (66 -69 aa) runs between the layers (Fig. 5B). A central disulfide bond between Cys 25 and Cys 76 on loop AB and strand E, respectively, may  stabilize the protein. Strand D lies on top of the two ␤-sheets. Altogether, the topology of Lamp-TCRLC is very different from that of Ctid-␤2m (Fig. 5C).
In protochordates, only a single gene, which is referred to as Amphi-VCP, has been predicted to have the IgSF C domain (14). Amphi-VCP was found to have 17.6% identity with Ctid-␤2m at the amino acid level. To further determine the evolutionary origin of the IgSF C set, the three-dimensional structure of Amphi-VCPC was predicted using amino acid homology modeling based on part of the CD4 complex (PDB code 2NY4B; 30). Amphi-VCPC was found to have 16.88% identity with the human T cell surface glycoprotein CD4 molecule (PDB code 2NY4B). The E-value of the constructed three-dimensional structure is 1.00e Ϫ8 . The predicted three-dimensional structure of the Amphi-VCPC domain is composed of 75 residues (in the 8 -82 region) that form a six-stranded ␤-sandwich fold (Fig. 5D), which is not typical for IgSF C set molecules. In the three-dimensional structure of Amphi-VCPC, strands A (17-20 aa) and C (48 -51 aa) comprise one ␤-sheet, whereas B (31-35 aa), E (62-69 aa), and F (72-79 aa) form the second ␤-sheet. Strand D (56 -58 aa) is positioned on top of the two ␤-sheets (Fig. 5E). However, the predicted central disulfide bond connecting Cys 21 and Cys 65 on loop AB and strand E is not found in the predicted three-dimensional structure. Compared with Ctid-␤2m, the topology of Amphi-VCPC is quite different (Fig. 5F), whereas the three-dimensional structures of Amphi-VCPC and Lamp-TCRLC are remarkably similar.
The Evolutionary Origin of the IgSF C Set-By examining the topology of the bony fish ␤2m, Amphi-VCPC and Lamp-TCRLC molecules, features indicating the evolutionary origin of the IgSF C set were found. The ␤2m molecule might be traceable as a descendent of the ancestor of Amphi-VCPC/ Lamp-TCRLC-like molecules. In Using the first approach mode in SWISS-MODEL, the three-dimensional structure of the Lingo-1 molecule (2ID5A) is found to share 24.68% identity with the IgSF C domain of Lamp-TCRLR at the amino acid level. The best E-value of the constructed Lamp-TCRLC three-dimensional structure is 2.20e Ϫ8 . The three-dimensional structure of Lamp-TCRLC is composed of 76 residues (amino acids in the 19 -94 region) with a six-stranded ␤-sandwich fold. D, sequence alignment of Ctid-␤2m with Amphi-VCPC. Structure of Amphi-VCPC alone (E) and overlaid with Ctid-␤2m (F). The three-dimensional structure of Amphi-VCPC was predicted by amino acid homology modeling, based on part of a CD4 complex (PDB code 2NY4B). Amphi-VCPC shares 16.88% identity with human T cell surface glycoprotein CD4. The E-value of the constructed three-dimensional structure is 1.00e Ϫ8 . The predicted three-dimensional structure of the Amphi-VCPC domain is composed of 75 residues (amino acids in the 8 -82 region) with a six-stranded ␤-sandwich fold. support of this possibility, 1) 18 -19 aa in the N terminus of Amphi-VCPC/Lamp-TCRLC could have evolved to become the A strand of ␤2m, and 2) a mutation in the BC loops of Amphi-VCPC/Lamp-TCRLC molecules may have led to the novel topology found in ␤2m.

DISCUSSION
In the study, we determined to 2.1 Å the three-dimensional structure of the ␤2m molecule in bony fish, which is the first ␤2m described in non-warm-blooded animals. The results show that Ctid-␤2m is composed of A, B, D, and E strands and C, F, and G strands like a standard IgSF C set molecule. Two extra hydrogen bonds at positions Ile 37 and Glu 38 and a hydrophobic pocket around the center of Ctid-␤2m were also found. Importantly, a short D strand and a longer CD loop might compose a flexible region for binding to the MHC heavy chain in bony fish. These features are different from the structures of mammalian and chicken ␤2ms, which have two strands, D1 and D2. To explore the evolutionary origin of the unattached IgSF C set, we homology modeled Amphi-VCPC and Lamp-TCRLC at the three-dimensional level. The proteins in the fusion form as well as the IgSF C might be precursors of ␤2ms, although ␤2m emerged suddenly in fish species. These results highlight the evolutionary propensity toward stability with the presence of an unusual flexible CD loop that co-evolved with the MHC class I molecules 400 million years ago.
IgSF domains can be classified as variable, constant, strandswitched, or hybrid based on their ␤-strand topology (2). Although the IgSF domains exist widely in the adaptive immune system of vertebrates, only two genes, Amphi-VCP and Lamp-TCRL, have highlighted the significance of the C domains in the jawless vertebrates and protochordates (13,14). To approach the evolutionary origin of the IgSF C, the threedimensional structures of Amphi-VCPC and Lamp-TCRLC were predicted, although homology modeling is not precise. The three-dimensional structure of Lamp-TCRLC is composed of 76 residues that form a six-stranded ␤-sandwich fold but is not typical for IgSF C set molecules. The three-dimensional structure of Amphi-VCPC is composed of 75 residues and also forms a six-stranded ␤-sandwich fold. Surprisingly, the topologies of Lamp-TCRLC and Amphi-VCPC are identical, although Amphi-VCPC and Lamp-TCRLC molecules split more than 500 million years ago. We hypothesized that the precursors of ␤2ms are the IgSF C set-related molecules as well as Amphi-VCP/Lamp-TCR-like molecules. In support of this hypothesis, the C domains of Amphi-VCPC and Lamp-TCRLC lack the A strand, which is similar to Ctid-␤2m. That might imply that the B strand of ␤2m is the hodiernal A strand in Amphi-VCPC/Lamp-TCRLC. There are 18 -19 aa at the N terminus of the Amphi-VCPC/Lamp-TCRLC molecules that could have evolved to become an A strand, as in ␤2m. In addition, a mutation event might have occurred in the BC loops of Amphi-VCPC/Lamp-TCRLC that led to the origination of a new strand. If so, the topology of Amphi-VCPC/Lamp-TCRLC might have either evolved to be the precursor of ␤2m or led to the creation of new fusion molecules, such as B cell receptor, TCR, Ig, MHC, CD4, and CD8.
In conclusion, the crystal structure of Ctid-␤2m was solved by molecular replacement. In the three-dimensional structure of Ctid-␤2m, two new hydrogen bonds and a strong hydrophobic pocket were discovered, resulting in two more stable ␤-sheets. On the other hand, a single D strand and long CD loop were found, which are indicative of an unusual flexible D strand in Ctid-␤2m. The three-dimensional structure of Ctid-␤2m highlights the evolutionary propensity toward stability, with the presence of an unusual flexible CD loop for binding to the MHC class I heavy chain. We hypothesize that ␤2ms evolved from C set molecules, such as Amphi-VCPC and Lamp-TCRLC molecules, in evolutionarily earlier animals. The predicted three-dimensional structures of both Amphi-VCPC and Lamp-TCRLC distinctly lacked the A and D strands of ␤2m. Furthermore, a region of 18 -19 aa in the N terminus of Amphi-VCPC and Lamp-TCRLC could have evolved to be the A strand of ␤2ms. Altogether, a mutation occurring in the BC loops of Amphi-VCPC and Lamp-TCRLC-like molecules may have led to a new topology, forming the basis for what is now a standard C set molecule.