Unconventional Peptide Presentation by Major Histocompatibility Complex (MHC) Class I Allele HLA-A*02:01

Peptide antigen presentation by major histocompatibility complex (MHC) class I proteins initiates CD8+ T cell-mediated immunity against pathogens and cancers. MHC I molecules typically bind peptides with 9 amino acids in length with both ends tucked inside the major A and F binding pockets. It has been known for a while that longer peptides can also bind by either bulging out of the groove in the middle of the peptide or by binding in a zigzag fashion inside the groove. In a recent study, we identified an alternative binding conformation of naturally occurring peptides from Toxoplasma gondii bound by HLA-A*02:01. These peptides were extended at the C terminus (PΩ) and contained charged amino acids not more than 3 residues after the anchor amino acid at PΩ, which enabled them to open the F pocket and expose their C-terminal extension into the solvent. Here, we show that the mechanism of F pocket opening is dictated by the charge of the first charged amino acid found within the extension. Although positively charged amino acids result in the Tyr-84 swing, amino acids that are negatively charged induce a not previously described Lys-146 lift. Furthermore, we demonstrate that the peptides with alternative binding modes have properties that fit very poorly to the conventional MHC class I pathway and suggest they are presented via alternative means, potentially including cross-presentation via the MHC class II pathway.

Peptide antigen presentation by major histocompatibility complex (MHC) class I proteins initiates CD8 ؉ T cell-mediated immunity against pathogens and cancers. MHC I molecules typically bind peptides with 9 amino acids in length with both ends tucked inside the major A and F binding pockets. It has been known for a while that longer peptides can also bind by either bulging out of the groove in the middle of the peptide or by binding in a zigzag fashion inside the groove. In a recent study, we identified an alternative binding conformation of naturally occurring peptides from Toxoplasma gondii bound by HLA-A*02:01. These peptides were extended at the C terminus (P⍀) and contained charged amino acids not more than 3 residues after the anchor amino acid at P⍀, which enabled them to open the F pocket and expose their C-terminal extension into the solvent. Here, we show that the mechanism of F pocket opening is dictated by the charge of the first charged amino acid found within the extension. Although positively charged amino acids result in the Tyr-84 swing, amino acids that are negatively charged induce a not previously described Lys-146 lift. Furthermore, we demonstrate that the peptides with alternative binding modes have properties that fit very poorly to the conventional MHC class I pathway and suggest they are presented via alternative means, potentially including cross-presentation via the MHC class II pathway.
Peptide presentation by MHC class I molecules regulates which fragments of a pathogen or cancer antigen are displayed to cytotoxic T cells for immune recognition. Understanding the mechanism of antigen presentation by MHC I is crucial in an attempt to design therapeutic strategies aimed at modulating subsequent immune responses to control disease.
Toxoplasmosis is a parasitic disease caused by infection with the large intracellular protozoan Toxoplasma gondii (1,2). Although generally asymptomatic in healthy adults, T. gondii infection can cause congenital toxoplasmosis during pregnancy and result in abortion or neonatal disease (1,2). T cell-mediated immunity against T. gondii-derived peptide antigens provides strong protection against T. gondii and involves both peptide presentation by major histocompatibility complex class I (MHC I) and class II (MHC II) proteins (3)(4)(5). Although T. gondii can interfere with CD4 T cell responses by down-regulating MHC II expression in IFN-␥-activated macrophages, immunization with T. gondii MHC II peptide ligands can elicit a potent CD4 T cell response that can lower parasite burden in the brain (6,7). Immunocompromised individuals and patients with T cell deficiencies are highly susceptible to T. gondii infections (8,9). CD8 ϩ T cell responses have been studies more widely than CD4 ϩ , and peptide ligands for MHC I have been identified to be derived from surface proteins or proteins of specialized secretory organelles (rhoptry proteins) that can be secreted into either the parasite cytosol or the parasitophorous vacuole (8 -14).
HLA-A*02:01 has been the focus of studies aimed at identifying MHC I-restricted peptide ligands that confer protection against T. gondii in HLA transgenic mice (15) and as such is a suitable MHC class I allele to study the basic rules of peptide presentation. Generally, most canonical peptide ligands for MHC I are 9 -10 amino acids in length. However, peptide ligands with more than 11 amino acids have been identified as ligands for MHC I in general and form the non-canonical ligand group (13,16,17). These long peptides have been shown to interact with the residues of the binding groove of HLA class I heavy (␣) chain much like the canonical binders with some changes. The second (P2) and C-terminal (P⍀) residues of the antigen peptide anchor into the A and F pockets of the binding groove, respectively, whereas the middle portion of these oversized peptides either "bulge out" or "zigzag" in the binding groove to be accommodated (18,19).
In contrast to these "bulged" peptides, we recently identified longer T. gondii peptides eluted from HLA-A*02:01 molecules that had a conserved N-terminal start but differed in their residue composition at the C terminus (20). We showed through crystallographic studies that in the HLA-A*02:01 complex with one 12-mer peptide residue Tyr-84 of the MHC heavy chain swung out and opened the F pocket, allowing the C-terminal amino acid of the peptide to protrude into the solvent, whereas the nested 11-mer N-terminal core peptide bound in a conventional zigzag orientation tucked with both peptide ends inside the peptide binding groove.
To further investigate whether the opening of the binding groove could be achieved with other peptides presented on T. gondii-infected cells and to understand what the structural requirements are to enable such unconventional modes of binding, we crystallized complexes of HLA-A*02:01 with several pairs of core (nested) and C-terminally extended peptides. Surprisingly, we found that that there are at least two distinct modes of opening the F pocket of HLA-A*02:01 involving the residues Tyr-84 and Lys-146. We suggest that these unconventional modes of binding will help better understand targets of MHC class I-restricted epitope recognition.

Results
Crystal Structures of HLA-A*02:01 in Complex with Conventional and Extended Peptides-To identify additional peptides with likely unconventional binding motifs, we scanned the set of peptides eluted from HLA-A*02:01 for those that had poor predicted binding affinity of the full-length peptide (percentile rank Ͼ10%) but contained a nested N-terminal peptide with high predicted affinity (percentile rank Ͻ2%). In our previous study, we had examined one such peptide (FVLELEPEWTVK), which had a single lysine added to the C terminus of the core peptide (FVLELEPEWTV) and induced a structural change in Tyr-84 of HLA-A*02:01 (20). In contrast, in the current study, we examined three sets of peptides that had C-terminal amino acid additions that contained negatively charged amino acids or both negatively and positively charged residues (Fig. 1). To investigate whether the F pocket of HLA-A*02:01 could also be opened by these extending peptides, we refolded HLA-A*02:01 with several nested and extending peptides and determined the crystal structures of these complexes. We obtained crystal structures for all complexes at resolutions between 1.85 and 2.75 Å (Table 1). Electron densities for all the peptides were well defined over the entire peptide length that is bound within the binding groove, whereas C-terminally extending residues that did not contact HLA-A*02:01 were disordered (Fig. 1). When all the different peptides are compared, slight structural changes in HLA-A*02:01 are observed in the A pocket. Peptides with an N-terminal tyrosine (YLSPIASPL, YLSPIASPLL, and YLSPIASPLLDGKSLR) open the A pocket slightly for the bulky side chain to be accommodated, whereas peptides that begin with glycine (GLKEGIPAL, GLKEGIPALDN, GLLPELPAV, and GLLPELPAVGGNE) are more buried inside the A pocket because they lack any side chain (Fig. 1). In addition, subtle structural changes are observed throughout the binding groove to allow optimal binding of the different amino acid side chains. However, when structures of the core peptides are compared with their respective extended peptides, the position of Tyr-84 of HLA-A*02:01 was unchanged. Surprisingly, however, Lys-146 of the F pocket, which is located close to Tyr-84 and forms a "lid" to bury the P⍀ amino acid in the core peptides, moved upward to open the F pocket when the extending peptides were bound (Fig. 1). Although Lys-146 adopts slightly different positions when all the extending peptide structures are compared, in each structure the Lys-146 lid was opened for the C-terminal extensions to protrude from the F pocket.
Hydrogen Bond Network for Nested and Longer Peptide Pairs-Next, we looked at the detailed interactions between HLA-A*02:01 and the individual peptides. In the case of peptide GLKEGIPAL, an extensive hydrogen bond network is seen involving the P⍀ leucine residue and residues of the heavy chain that line the binding groove including Asp-76, Thr-80, Tyr-84, Thr-143, Lys-146, and Trp-147 (Fig. 2a, upper panel). In contrast, for the extended peptide GLKEGIPALDN, the hydrogen bond between Lys-146 and terminal carboxyl group of P⍀ leucine is replaced with one between Lys-146 and the P⍀ϩ1 aspartate side chain because Lys-146 adopts a different orientation (Fig. 2a, lower panel). In the case of the peptide pair

Peptide-induced Opening of the F Pocket of MHC Class I
GLLPELPAV and GLLPELPAVGGNE, a similar hydrogen bond network is observed for both peptides with only a minor difference in the crystal structure of the longer peptide. The hydrogen bond interaction between the terminal carboxylate of P⍀ valine and Lys-146 is missing in the crystal structure with the longer peptide (Fig. 2b). The same is true for the peptide pair YLSPIASPL and YLSPIASPLLDGKSLR (Fig. 2c). As a result, the change in the orientation of Lys-146 leads to the loss of hydrogen bond formation with the carboxylate of the P⍀ amino acid. However, the hydrogen bond interaction between HLA-A*02:01 residue Trp-147 and the backbone oxygen of the P8 amino acid remains conserved. Depending on the amino acid following residue P8, a novel hydrogen bond can be formed with the side chain of a compatible amino acid at P⍀ϩ1 (here Asp-10). Because the C-terminally extending amino acids project away from the peptide binding groove, electron density becomes increasingly disordered as the peptide exits the F pocket ( Fig. 1).
Extending Peptides Do Not Significantly Destabilize HLA-A*02:01-To determine the relative stability of the individual HLA-A*02:01⅐peptide complexes, we followed their thermal denaturation by differential scanning fluorimetry. The melting temperatures (T m ) obtained from the melt curves allowed us to compare the stability of the different complexes (Fig. 3). We observed that complexes of HLA-A*02:01 with extended peptides had similar stability to those with their equivalent nested peptides with not more than 8°C difference between them. For example, the T m for HLA-A*02:01 complex with GLKEGIPAL is 63°C, whereas that with its longer peptide counterpart is 61°C. Addition of 6 extra residues to peptide YLSPIASPLL also only changes the T m of the complex with peptide YLSPIASPLL-DGKSLR by about 7°C (Fig. 3 and Ref. 20). Interestingly, some of the complexes with nested peptides are as stable as those with longer peptides (compare GLKEGIPAL with GLLPEP-PVGGNE; Fig. 3). It is also worth noting that there are some variations in the stability of complexes with different nested peptides. For instance, peptide GLKEGIPAL forms a less stable complex as compared with peptide GLLPELPAV (Fig. 3). Although most interactions between HLA-A*02:01 and the peptides are conserved, there is a significant difference in the hydrogen bond interaction between Tyr-84 and their terminal carboxylate (3.65 Å for G9L and 2.9 Å for G9V; Fig. 2). The lack of an intimate hydrogen bond interaction of the terminal amino acid with Tyr-84 in G9L peptide is likely a major contributor to the reduced melting temperature. Compared with our previous study, we noticed that a single positively charged amino acid addition (compare FVLELEPEWTV and FVLELEPEWTVK; Ref. 20) destabilizes the protein⅐peptide complex more than extending peptides that follow a short (2-amino acid) negatively charged residue addition (compare GLKEGIPAL with GLKE-GIPALDN; Fig. 3). However, longer peptide additions (4 -6 amino acids) that contain a negatively charged amino acid (YLSPIASPLL versus YLSPIASPLLDGKSLR and GLLPELPAV versus GLLPELPAVGGNE) reduce the protein⅐peptide complex to the same extend as the single "Lys" addition found in peptide FVLELEPEWTVK (20). This highlights that the structural change involving Tyr-84 of HLA-A*02:01 is more destabilizing than that of Lys-146 when the negatively charged peptide extension is very short.
Lysine 146 Lift-The different orientations of Lys-146 upon binding of the longer peptides open the F pocket of HLA-A*02:01 and are required for the C-terminally extending amino acids to project into the solvent because Lys-146 forms a partial lid above the F pocket held in position by the hydrogen bond interaction of the carboxylate of the C-terminal amino acid (P⍀) of any nested peptide (Figs. 2 and 4). Although the position of Lys-146 is not precisely conserved between the different structures of HLA-A*02:01 bound to the extending peptide, the lift of the residue to accommodate the peptide extension seems to be consistent. In the case of the crystal structure of HLA-A*02:01 with YLSPIASPL, YLSPIASPLL, and YLSPIASPLL-DGKSLR, there are variations in the way that the two nested peptides are accommodated in the binding groove. With YLSPIASPL, the binding of the residues is quite conventional with the P2 and P⍀ anchor residues binding to the A and F pockets. Surprisingly, however, YLSPIASPLL, the nested peptide with one extra leucine residue at the C-terminal end of the peptide, undergoes a certain extent of bulging to accommodate the terminal leucine residue (P⍀ϩ1 instead of P⍀) as the anchor residue in the F pocket (Fig. 4d). The difference in the binding of these two nested peptides to HLA-A*02:01 underscores the requirement for a sequence motif or particular amino acid features within the bound peptide to induce movement of Lys-146 to open the F pocket. Because a mere increase  in length of the peptide does not cause the change in orientation of Lys-146, it is likely that the addition of charged residues within the C-terminal extension is the contributing factor to open the F pocket. Thus, in addition to the previously identified "Tyr-84 swing" to accommodate the peptide FVLELEPEWTVK (UFP (16 -27)) (20), we observed a "Lys-146 lift" in HLA-A*02:01 as a second mechanism of opening the F pocket induced by the extending peptides YLSPIASPLLDGKSLR, GLKEGIPALDN, and GLLPELPAVGGNE (Fig. 5).
It's a Game of Charge-The negatively charged amino acids in the extended peptides do not always immediately follow the nested conventional P⍀ anchor residue but can be several residues downstream (Fig. 5c). The previously reported longer peptide contains a C-terminal addition of a positively charged residue (FVLELEPEWTVK) that opens the binding groove using the Tyr-84 swing mechanism. Here, we observed that all extended peptides with a negatively charged residue only open the binding groove using the Lys-146 lift mechanism. This included the peptide YLSPIASPLLDGKSLR, which contains a lysine (positive charge) residue following the aspartate (negative charge), but no Tyr-84 swing was observed. This suggested that the first charged residue determined which of the two distinct structural modes of binding an extended peptide will induce in HLA-A*02:01.
Extensions with Negative Charges Are Longer and More Frequent than Extensions with Positive Charges-Given the discovery of several ligands binding in an unconventional mode, we aimed to assess the generality of these findings. In the T. gondii peptide elution data set, a total of 134 peptides were predicted to bind with conventional P2-P⍀ anchors with predicted rank Յ10%. Of the remaining 150 peptides with poor predicted binding, 108 contained a nested strong binder (rank Յ2%) at the N terminus with 1 or more residues extending beyond P⍀. These 108 peptides are expected to be enriched for examples with a similar unconventional mode of binding as those in our structural studies.
We classified the ligands into having "negative" or "positive" extension based on the first charged residue found after the nested binding peptide. Nearly half of the extended ligands contained a negatively charged residue as the first charged amino acid within the first 3 residues of the extension, whereas positively charged extensions were less frequent (Fig. 6a). Positive extensions were short (less than 3 residues on average), whereas negative extensions had an average length of 8.7 residues (Fig.  6b). Notably, 66% of the positive extensions consisted of a single or 2 residues compared with only 8% of negative extensions being shorter than 3 amino acids. Considering only the longest version of ligands with extensions of multiple sizes, the average length of negative extensions was of 11.0 residues.
Extended Peptide MHC I Ligands Show a Putative Processing Motif That Is More Similar to MHC Class II Ligands than Conventional MHC I Ligands-Given the unconventional length and mode of binding of the observed extended peptides and given that T. gondii has an unusual compartmentalized life cycle in the cells it infects, we wanted to examine whether the unconventional ligands found had the typical motifs of peptides derived from the conventional MHC class I processing and presentation pathway. As shown in Fig. 6, this was not the case. Extended ligands had significantly lower scores for TAP 2 transport (Fig. 6c) compared with canonical T. gondii ligands (p ϭ 1.2 ϫ 10 Ϫ5 , Wilcoxon rank sum test) but were not significantly different from those of random peptides (p ϭ 0.34). Similarly, proteasome cleavage scores (Fig. 6d) for extended ligands were significantly lower compared with canonical ligands (p ϭ 5 ϫ 10 Ϫ16 ) but were not significantly higher than the random natural peptides (p ϭ 0.24). In other words, long ligands with terminal extensions were predicted to be poor substrates both for proteasome cleavage and TAP transport, suggesting an alternative mechanism for the generation and translocation to MHC class I of these extended ligands.
Given the life cycle of T. gondii, it is possible that the unconventional MHC I peptide ligands derived from it are processed and (cross-)presented through the same pathway as MHC II ligands. If that is the case, we would not only expect that these ligands have different amino acid motifs as those found for MHC I ligands (as was shown above) but also that they have a pattern congruent with what is found for MHC II ligands. Accordingly, we examined the amino acid patterns of the C-terminal residues in extended ligands. Remarkably, despite being predominantly negatively charged or uncharged in the first 3 residues of the extension, a large fraction of ligands presented either an arginine or lysine at the very C-terminal residue (42 of 108). If these ligands were cross-presented, we would expect similar trimming motifs in class II ligands, and we examined published data sets of MHC II ligands for evidence of such a motif (see "Experimental Procedures"). Indeed, we found that the C-terminal composition of the extended T. gondii class I ligands strongly correlated to the residue distribution at the C terminus of eluted MHC class II ligands where positively charged amino acids were also enriched (Fig. 6e). Taken together, we found that the unconventional MHC I ligands presented by T. gondii have sequence motifs much more consistent with cross-presentation than with generation through proteasomal cleavage and TAP transport.

Discussion
␣␤ T cell receptor (TCR) recognition of MHC-presented microbial peptides initiates T cell-mediated immunity against infection. Generally, the TCR binds with both TCR ␣ and ␤ chains in a diagonal orientation above the MHC molecule (28).
Although the germ line-encoded complementarity-determining region 1 and 2 loops bind to MHC, the hypervariable loops complementarity-determining regions 3␣ and 3␤ specifically bind and recognize the peptide and provide antigen specificity (21). MHC I has a closed binding groove, and peptides bind with both ends tucked inside the binding pocket, whereas MHC II has an open binding pocket, and peptide ligands (typically 15-20 amino acids) bind with both N and C termini hanging over the end of the groove. Because MHC I presents peptides in a more confined space compared with MHC II, the TCR of CD8 ϩ T cells often contacts and discriminates their entire peptide sequence (21). In our present study, we have focused on a  gondii eluted set were more frequent than expected compared with a background distribution of resampled control data sets, whereas positively charged extensions (Arg and Lys) were less frequent. b, positive extensions were on average shorter than 3 residues, and negative extensions had a mean length larger than 8 residues. c, TAP transport scores for extended ligands were significantly lower than for canonical ligands and not significantly different from random. d, proteasome cleavage scores for extended ligands were significantly lower than for canonical ligands and not significantly higher than random. e, the C-terminal composition of T. gondii ligands correlates with the C-terminal enrichment of class II ligands with positively charged residues in the first quadrant and hydrophobic amino acids dominating the third quadrant. IEDB, Immune Epitope Database. Error bars represent S.D.
panel of C-terminally extended T. gondii peptides that contain a negatively charged amino acid within their C-terminal extensions. Analysis of a large data set of HLA-A*02:01-eluted T. gondii peptides from our previous study (20) demonstrated that C-terminally extending peptides are very common in T. gondii and that additions that contain negatively charged amino acids are more represented than those that contain only positively charged residues.
These peptides contain a canonical HLA-A2*02:01 binding motif at their N termini, but addition of the C-terminal extensions render predictions about their binding to MHC I difficult.
Although the N termini of these peptides bind like canonical peptides to the MHC I allele HLA-A*02:01, the C-terminal extensions induce a structural change at the F pocket to allow their extension into the solvent. As such, algorithms aimed at predicting peptide binding to MHC I need to capture rules that allow identification of a canonical N-terminal MHC I binding motif, whereas adding descriptors, such as charged residues in the C-terminal extension, is necessary to predict binding of C-terminally extending peptides. In principle, these rules could be learned directly from peptide binding data using machine learning techniques, such as the recently described extended NNalign method (22,23). However, this is complicated by the very limited amount of quantitative data characterized by noncanonical binding available. We would envision this situation to change as more binding data become available and as MHC ligand data are included in the training data of MHC class I binding prediction algorithms. We showed that only charged residues that follow the P9 amino acid of a canonical peptide within 3 or fewer amino acids induce the structural change that allows them to bind to MHC I and stabilize the complex. These rules can now be incorporated into existing algorithms to predict MHC I binding peptides at least for HLA-A*02:01.
Because the residues that are involved in the F pocket opening (Tyr-84 and Lys-146) are conserved across all HLA-A, -B, and -C alleles (and found in many non-classical MHC I molecules, such as HLA-E, HLA-G, and Qa-1 as well as viral MHC I mimics such as UL18), we postulate that the ability to open the F pocket is a universal characteristic found across many MHC I molecules. Although it is not clear to what extent this structural change affects CD8 ϩ T cell recognition, killer immunoglobulin receptors (KIRs), a family of activating and inhibitory receptors expressed on natural killer cells, natural killer T cells, and many CD4 ϩ and CD8 ϩ T cells, bind directly above the F pocket. As a consequence, any structural change around the F pocket would likely affect KIR binding and directly modulate host immune responses (24,25). In particular, the inability of inhibitory KIRs to engage MHC I would lower the threshold of activation for many more immune cells to combat infection or cancers. Future studies will have to address the origin and potential function of these longer peptides in host immunity.

Experimental Procedures
HLA-A*02:01 Expression and Purification-HLA-A*02:01 class I heavy chain ectodomain (residues 21-274) and human ␤ 2 -microglobulin (␤ 2 m) (residues 1-99) were expressed as inclusion bodies and refolded as reported previously (20) with modifications reported here. Briefly, both the heavy chain and light chain were expressed in Escherichia coli BL21 DE3 cells and induced at A 600 of 0.6 with 1 mM isopropyl 1-thio-D-galactopyranoside, and cells were harvested after 4 h by centrifugation (5000 ϫ g for 20 min). Cells were resuspended separately in lysis buffer (100 mM Tris-HCl, pH 7.0, 5 mM EDTA, 5 mM DTT, 0.5 mM PMSF), and the cells were broken with four to five passes through a microfluidizer (20 kilopascals) (Microfluidics). Cell lysate was centrifuged (50,000 ϫ g for 30 min at 4°C) to collect inclusion bodies. Inclusion bodies were further resuspended in wash buffer A (100 mM Tris-HCl, pH 7.0, 5 mM EDTA, 5 mM DTT, 2 M urea, 2% (w/v) Triton X-100), centrifuged again, and washed in wash buffer B (100 mM Tris-HCl, pH 7.0, 5 mM EDTA, 2 mM DTT). Finally, the inclusion bodies were denatured in extraction buffer (50 mM Tris-HCl, pH 7.0, 5 mM EDTA, 2 mM DTT, 6 M guanidine HCl) for subsequent refolding. 3 mg of ␤ 2 m was added dropwise to 250 ml of refolding buffer (0.1 M Tris-HCl, pH 8.0, 2 mM EDTA, 400 mM L-arginine, 5 mM oxidized glutathione, 5 mM reduced glutathione) and stirred for 1-2 h. Between 11 and 15 mg of HLA-A heavy chain mixed with 2-3 mg of individual peptide (GenScript) was then added to the refolding mixture and further stirred at 4°C for 72 h. Final heavy chain:light chain:peptide ratios were in the range of 2:1:12 and 2.5:1:12 for the different peptides. Following refolding, the refolding mixture was centrifuged at 50,000 ϫ g to remove any precipitated protein, and the supernatant was concentrated to about 3 ml for size exclusion chromatography using a Superdex S200 HR16/60 gel filtration column pre-equilibrated with size exclusion chromatography buffer (20 mM Tris-HCl, pH 7.5, 150 mM NaCl). Fractions containing refolded HLA-A*02:01⅐␤ 2 m⅐peptide complexes were pooled, concentrated to 5-12 mg/ml, and used for subsequent crystallization experiments.
Crystallization and Data Collection-Initial attempts to obtain crystals for the HLA-A*02:01⅐peptide complexes using factorial screens were not successful. Thin needle-shaped sea urchin crystals of HLA-A*02:01⅐UFP(16 -27) complex obtained in 1.2 M sodium citrate were used to cross-seed the other complexes. The complexes were equilibrated in 30% PEG 4000, 0.1 M Tris-HCl, pH 8.0, 0.2 M lithium sulfate for 1-2 h by mixing 0.15 l of complex and 0.15 l of precipitant at 20°C before seeding. Thin platelike crystals were obtained by sitting drop vapor diffusion at 20°C after 2-4 days. The crystals were flash frozen in cryoprotectant (crystallization solution, 100% glycerol; 3:1) using liquid nitrogen.
Diffraction data for HLA-A*02:01 complex with peptides G9V, G11N, G13E, and Y16R were collected remotely at beamline 7.1 at the Stanford Synchrotron Radiation Lightsource and processed to 1.86-, 2.3-, 2.1-, and 2.4-Å resolution, respectively, using HKL2000. Diffraction data for HLA-A*02:01 complex with peptides G9L, Y9L, and Y10L were collected remotely at beamline 12.3.1 at the Advanced Light Source and processed to 1.85-, 2.5-, and 2.75-Å resolution, respectively, using HKL2000. Phases were obtained by molecular replacement with Phaser MR (26) in ccp4i (27,28) using the protein coordinates for HLA-A*02:01 (Protein Data Bank code 3MRE) and resulted in unambiguous electron density for all the peptides. Model building was carried out using Coot (29,30). Structures were refined using Refmac (31). Data collection and structure refinement parameters are provided in Table 1.
Thermal Denaturation Assay-HLA-A*02:01⅐␤ 2 m⅐peptide complexes with the various peptides were analyzed for thermal denaturation by differential scanning fluorimetry using a Light-Cycler 480 (Roche Applied Science). HLA-A*02:01⅐␤ 2 m⅐peptide complexes with different peptides at 100 M in reaction buffer (20 mM Tris-HCl, pH 7.5, 150 mM NaCl) were used as stock solutions. Each reaction mixture constituted 1-2 l of protein complex stock solution and 2 l of SYPRO Orange dye (100ϫ; Invitrogen) made up to 20 l in reaction buffer in a 96-well white plate compatible with the instrument. A temperature gradient from 20 to 85°C at steps of 0.06°C/s and 10 acquisitions/°C was used for the experiment. Each experiment with an individual protein⅐peptide complex was repeated thrice. A melt curve of the total fluorescence of the run was plotted against temperature. The minima of the first derivative of the melt curve from raw fluorescence data (temperature differential of absolute fluorescence versus temperature) provided the T m for individual HLA-A*02:01⅐␤ 2 m⅐peptide complexes (inflection point of the melt curve) (32).
Binding Affinity Predictions and Analysis of Extensions-Binding affinities for all 284 eluted ligands to HLA-A*02:01 in the data set of McMurtrey et al. (20) were predicted using Net-MHCpan-3.0 (21). Peptides with conventional P2-P⍀ anchors and predicted rank of up to 10% were considered as canonical binders. A rank score lower than 10% indicates that a peptide is among the 10% strongest binders for HLA-A*02:01 in a large pool of random natural peptides. Peptides that were not predicted to bind canonically but contained a nested 8 -11-mer subsequence with predicted high affinity (rank within top 2%) at the N terminus were classified as extended peptides. Extensions following the P⍀ of unconventional binders were categorized based on the first charged residue of the extension found within the first 3 residues of the extension. Extensions where the first charged residue was an Arg or Lys were classified as positive, whereas extensions where the first residue was a Asp or Glu were considered negative. As controls, we randomly picked amino acid sequences from the same set of source proteins with the same length distribution as the observed extensions. We repeated the sampling for random extensions a thousand times to be able to calculate distributions and compared them with the observed extensions.
Derivation of a MHC II Processing Motif Based on Published Data-A large set of 16,868 unique eluted HLA class II ligands was downloaded from the Immune Epitope Database (33) and inspected for amino acid enrichment at the C terminus. We compared the frequency of the very last residue at the C terminus in the MHC II ligands and in the T. gondii ligands, applying a pseudocount correction (34) with ␤ ϭ 50. Pseudocounts exploit information about amino acid similarity to smooth the observed amino acid frequencies of small sequence data sets. This correction has a negligible effect on the large set of MHC II ligands, but it is important in the T. gondii data set because some amino acids were never observed at the C terminus. Enrichment scores were then calculated as S A ϭ log 2 (f A /q A ) where f A is the pseudocount corrected frequency for amino acid A and q A is the background frequency of A in natural proteins.
Prediction of Proteasomal Cleavage and TAP Transport-Predictions of proteasomal cleavage and TAP transport were obtained using the MHC class I processing tools of the Immune Epitope Database (35,36) for ligands predicted to bind both canonically and in the extended mode. For the processing predictions, precursors for all ligands were obtained by elongating them by 1 residue at the N terminus (to allow for transport of elongated precursors by TAP) and 5 residues at the C terminus (to cover the residues thought to impact cleavage by the proteasome) using the context of their source protein. As a control, we also produced proteasome and TAP scores for 100 random natural sequences extracted from UniProt.