Employing a Recombinant HLA-DR3 Expression System to Dissect Major Histocompatibility Complex II-Thyroglobulin Peptide Dynamism

Previously, we have shown that statistical synergism between amino acid variants in thyroglobulin (Tg) and specific HLA-DR3 pocket sequence signatures conferred a high risk for autoimmune thyroid disease (AITD). Therefore, we hypothesized that this statistical synergism mirrors a biochemical interaction between Tg peptides and HLA-DR3, which is key to the pathoetiology of AITD. To test this hypothesis, we designed a recombinant HLA-DR3 expression system that was used to express HLA-DR molecules harboring either AITD susceptibility or resistance DR pocket sequences. Next, we biochemically generated the potential Tg peptidic repertoire available to HLA-DR3 by separately treating 20 purified human thyroglobulin samples with cathepsins B, D, or L, lysosomal proteases that are involved in antigen processing and thyroid biology. Sequences of the cathepsin-generated peptides were then determined by matrix-assisted laser desorption ionization time-of-flight-mass spectroscopy, and algorithmic means were employed to identify putative AITD-susceptible HLA-DR3 binders. From four predicted peptides, we identified two novel peptides that bound strongly and specifically to both recombinant AITD-susceptible HLA-DR3 protein and HLA-DR3 molecules expressed on stably transfected cells. Intriguingly, the HLA-DR3-binding peptides we identified had a marked preference for the AITD-susceptibility DR signatures and not to those signatures that were AITD-protective. Structural analyses demonstrated the profound influence that the pocket signatures have on the interaction of HLA-DR molecules with Tg peptides. Our study suggests that interactions between Tg and discrete HLA-DR pocket signatures contribute to the initiation of AITD.

Previously, we have shown that statistical synergism between amino acid variants in thyroglobulin (Tg) and specific HLA-DR3 pocket sequence signatures conferred a high risk for autoimmune thyroid disease (AITD). Therefore, we hypothesized that this statistical synergism mirrors a biochemical interaction between Tg peptides and HLA-DR3, which is key to the pathoetiology of AITD. To test this hypothesis, we designed a recombinant HLA-DR3 expression system that was used to express HLA-DR molecules harboring either AITD susceptibility or resistance DR pocket sequences. Next, we biochemically generated the potential Tg peptidic repertoire available to HLA-DR3 by separately treating 20 purified human thyroglobulin samples with cathepsins B, D, or L, lysosomal proteases that are involved in antigen processing and thyroid biology. Sequences of the cathepsingenerated peptides were then determined by matrix-assisted laser desorption ionization time-of-flight-mass spectroscopy, and algorithmic means were employed to identify putative AITD-susceptible HLA-DR3 binders. From four predicted peptides, we identified two novel peptides that bound strongly and specifically to both recombinant AITD-susceptible HLA-DR3 protein and HLA-DR3 molecules expressed on stably transfected cells. Intriguingly, the HLA-DR3-binding peptides we identified had a marked preference for the AITD-susceptibility DR signatures and not to those signatures that were AITD-protective. Structural analyses demonstrated the profound influence that the pocket signatures have on the interaction of HLA-DR molecules with Tg peptides. Our study suggests that interactions between Tg and discrete HLA-DR pocket signatures contribute to the initiation of AITD.
The major histocompatibility complex (MHC) 3 II molecule is intimately associated with the initiation of autoimmunity. Studies in several MHC II-associated autoimmune conditions, most notably type 1 diabetes, have now convincingly demonstrated that susceptibility to disease is caused by certain structural features of the MHC II peptide binding cleft that influence the display of immunogenic peptides (1)(2)(3)(4)(5). Similarly, we have shown that the autoimmune thyroid diseases (AITD), Graves disease (GD) and Hashimoto thyroiditis (HT), are strongly associated with unique HLA-DR peptide binding cleft sequences (6,30). Mechanistically, these pocket amino acid signatures could accommodate autoantigenic peptides derived from any of the thyroid-specific proteins, namely thyroglobulin (Tg), thyroid peroxidase, and thyrotropin receptor. Indeed, there exist data demonstrating that thyroglobulin, which constitutes 80% (8) of total thyroidal protein, is an important thyroidal target and participant in triggering AITD (reviewed in Ref. 9). Along a similar vein of thought, there is mounting evidence that suggests a role for polymorphisms in the thyrotropin receptor, the specific target of Graves disease, in the initiation of Graves hyperthyroidism (28).
For the case of thyroid autoimmunity, the connection between MHC II susceptibility signatures and how they influence autoantigenic peptide binding is gradually becoming unveiled. Work from our own group has shown that a single amino acid variation in the HLA-DR3 peptide binding cleft, resulting in an arginine at position 74 of the ␤ chain (DR␤-Arg-74), was strongly associated with GD (6). Moreover, upon coinheriting DR␤-Arg-74 and a disease-associated Tg SNP (R1999W), the risk of developing GD was elevated significantly beyond the additive effect of these two variants, resulting in an odds ratio of 15 (7) Thus, we hypothesized that this statistical interaction mirrors an in vivo biochemical interaction between thyroglobulin peptides and the disease-associated pocket signature of HLA-DR3.
Given the potential importance of Tg/HLA-DR3 interactions, factors that could influence the generation of Tg-derived peptides, and/or the binding of a given thyroglobulin peptide to HLA-DR␤-Arg-74, would be expected to be strongly linked to the etiology of AITD. For example, cathepsins B, D, and L, endosomal proteases involved in antigen processing (11), are also key enzymes in Tg degradation (12) and thyroid biology (13,14), potentially making them major determinants of the Tg peptidic repertoire (15). In addition to extrinsic factors such as proteases, factors that are intrinsic to Tg, such as polymorphisms, could, by virtue of the induction of either a change in conformation or a change in the proteolytic cleavage site, render a given region more or less sensitive to digestion and thus alter the pool of bio-available peptides (reviewed in Ref, 16).
The goals of this study were to examine, at both a biochemical and structural level, the features of the HLA-DR susceptibility and resistance variants that influence Tg peptide binding. Toward achieving this end, we have designed an Escherichia coli-based system that enabled us to make functional versions of "empty" HLA-DR3. Furthermore, we have used these recombinant MHC II molecules to examine binding to selected peptides from a putative bioavailable Tg peptide pool, as well as to two Tg-derived peptides that were previously reported to bind to HLA-DR3 (17). Here, we present data that shed insights on how susceptibility and resistance signatures of the HLA-DR peptide binding pocket influence the selection of the Tg peptides that can be presented to autoreactive T-cells in the initiation of AITD.

Purification of Thyroglobulin from Human Thyroid Tissues-
The protocol was approved by the University of Cincinnati Institutional Review Board for obtaining de-identified tissues. Twenty thyroid samples were obtained either from healthy portions of glands removed at surgery or from thyroids taken at autopsy (total death time Յ16 h) that were purchased from the NCBI Brain and Tissue Bank at the University of Maryland. Thyroid tissue samples were subsequently stored at Ϫ80°C until use. Each thyroid tissue (typically 0.2-0.6 g) was thawed and resuspended in a 10-fold volume/tissue weight in 20 mM disodium hydrogen phosphate, 50 mM KCl, 10% glycerol, pH 7.0, in the presence of 1 mM phenylmethylsulfonyl fluoride, 1 mM EDTA, caproic acid (added 25 l per 25 ml of buffer), and a mixture of protease inhibitors (catalogue number 04 693 116 001, Roche Applied Science; 1 tablet added per 25 ml of buffer) The sample was treated in a Dounce homogenizer, and protein was precipitated, via the addition of 0.4 g/ml ammonium sulfate. Precipitate was next resuspended in 10 -20 ml of resuspension buffer. Dialysis followed, using a membrane with 15,000 molecular weight cutoff against 3 liters of 20 mM disodium hydrogen phosphate, 50 mM KCl, 1 mM EDTA, and 10% glycerol, pH 7.0. The sample was filtered, using 0.22-m mem-branes, and was applied to a Sephacryl S-200 (GE Healthcare) size exclusion column. The thyroglobulin-containing portion, which came off in the void volume, was then applied to an anion exchange column (Anx-Sepharose 4 fast flow, catalogue number 17-1287-10, GE Healthcare) and was run at 2 ml/min, using a linear gradient from 50 to 600 mM KCl. Thyroglobulin eluted in a broad peak, spanning 200 -600 mM KCl. Fractions were pooled and were subsequently concentrated to 1 ml, using a filter (Pierce) with a 20,000 molecular weight cutoff. The concentrate was diluted 1:20 in the final buffer (10 mM phosphate, 25 mM KCl, 10% glycerol, pH 7.0) and was then concentrated to 1-10 mg/ml protein, as determined using a protein assay dye reagent kit (Bio-Rad). Samples were snap-frozen in liquid nitrogen and were subsequently stored at Ϫ80°C.
DNA Purification from Thyroid Tissue-DNA was extracted from a small section of each thyroid tissue, using the Puregene kit (Gentra Systems, Minneapolis, MN), according to the manufacturer's protocol.
SNP Genotyping-SNPs, identified within the Tg gene, were analyzed by a fluorescent-based restriction fragment length polymorphism method, as described previously (18).
Cathepsin Digestion of Individual Thyroglobulin Samples-Cathepsins B, D, and L were used separately to digest Tg. Digestions were performed as follows: to 40 g of human Tg (in 33 l of 10% glycerol, 10 mM phosphate, pH 7.0, 25 mM KCl), 1.875 l of 1 M citrate, pH 3.5, 0.375 l of 1 M MgCl 2 , 37.15 l of distilled H 2 O, and 2.6 l of cathepsin (resuspended in 500 mM citrate, pH 3.5, 50% glycerol)) were added. The final cathepsin activity was 0.085 units/l for cathepsins B and D and 0.003 units/l for cathepsin L. Cathepsins B (Sigma) and D (Sigma) were purified from human liver, whereas cathepsin L (Cal Biochem) was derived from a human cDNA, expressed recombinantly. Digestions were performed at 37°C for 16 -20 h. Separation of the digested Tg protein on an SDS-7.5% polyacrylamide gel versus undigested control verified that the digestion into constituent peptides was nearly complete.
Mass Spectroscopy-Peptides profiles of cathepsin-digested thyroglobulins were analyzed using matrix-assisted laser desorption ionization time-of-flight-mass spectroscopy. Desaltation on target was used to eliminate peptide loss during peptide preparation for matrix-assisted laser desorption ionization analysis. ␣-Cyano-4-hydroxycinnamic acid was used as matrix and prepared as a saturated solution in a 1:4:4 (v/v) mixture of formic acid/water/isopropyl alcohol. A small aliquot (1 l) of mixture of peptide/matrix (1:1) was spotted onto the sample plate. As soon as crystals were formed, the excess liquid was removed by vacuum aspiration. The spot was then washed twice for a few seconds with 2 l of 0.1% trifluoroacetic acid. All spectra were acquired on a Voyager-DE STR time-of-flight mass spectrometer (Applied Biosystems, Foster City, CA) in linear and reflector mode. The spectra were smoothed, calibrated, and analyzed using the program M-over-Z.
Peptide sequences were acquired on a nanoflow liquid chromatography/tandem mass spectrometry instrument. The nano-LC system included a Micro-Tech XtremeSimple nano-LC pump system (Vista, CA) and a Shimadzu SIL-20AC AutoSampler (Kyoto, Japan). Peptides were separated on a reversed-phase C-18 column (100 m (inner diameter) ϫ 15 cm, 3 m, 300 Å) (Micro-Tech, CA) by linear gradient elution of 3-50% solvent B in 60 min, using solvent A containing 98% water, 2% acetonitrile, 0.1% formic acid; and solvent B containing 98% acetonitrile, 2% water, 0.1% formic acid at flow rate of 600 nl/min. LTQ-Orbitrap (Thermo-Fisher, MA) was used to collect mass data, and the instrument was operated in the datadependent mode. Survey MS spectra (from m/z 300 to 1600) were acquired in the Orbitrap with resolution R ϭ 30,000 at m/z ϭ 400. The five most intense ions were isolated and fragmented in a linear ion trap. The resulting fragment ions were recorded in the Orbitrap with resolution R ϭ 15,000 at m/z ϭ 400 or in the linear ion trap. Peptides were identified via automated data base searching by Sorcerer (version 3.10.7) (Sage-N) against an in-house created human thyroglobulin protein sequence data base composed of all entries of human thyroglobulin protein from the NCBI Reference Sequence (RefSeq) data base and our own thyroglobulin protein sequence containing SNP information. No digestion was selected during the creation of the search data base from the Fasta files. Mass tolerances on MS peaks were 5 ppm and on tandem MS peaks were 0.5 Da. Scaffold 2.0 was used to open the search result files. Only peptides with an XCorr value from SEQUEST searching software above the following settings, Sequest XCorr Ͼ1.8, 1ϩ; Ͼ2.0, 2ϩ; Ͼ2.2, 3ϩ; Ͼ2.5, 4ϩ and above, or an E value from X!Tandem searching software above 2 were accepted for identification.
Correlation of SNP Genotypes and Peptides Produced by Digestion-The peptide profiles, analyzed by MS, were correlated with the Tg SNP genotypes. Across the different mass spectra, a representative peak was taken as an internal standard. The heights of all other peaks in the spectrum were normalized with respect to the internal standard peak. Normalized peak heights were then examined for correlation with the Tg genotype for three Tg SNPs that were shown to be associated with AITD (E10SNP24 (S734A or rs180223), E12SNP (M1027V or rs853326), and E33 SNP (R1999W or rs11535853)) (10).
Peptide and Oligonucleotide Synthesis-Peptides (Ͼ95% purity) were ordered commercially either from AlphaDiagnostic International (San Antonio, TX) or GenScript (Piscataway, NJ). Peptides were synthesized with biotin, at their N termini, to facilitate subsequent detection. Oligos were purchased from either Sigma-Proligo (The Woodlands, TX) or Integrated DNA Technologies (Coralville, IA).
Western Blot Analyses-Frozen thyroid tissues (a total of six samples) were resuspended in lysis buffer (PBS with 1 mM EDTA and protease inhibitors) at 2.5ϫ the volume of tissue sample. Samples were vortexed 3-4 times in 5-s bursts and were kept on ice for 30 min with vortexing every 10 min. Samples were subsequently spun down for 14,000 rpm for 10 min at 4°C and boiled for 5 min. Protein concentration was determined using a protein assay dye reagent (Bio-Rad) against the standard curve constructed with increasing amounts of bovine serum albumin (Bio-Rad). 10 g of each sample were loaded onto an SDS-10% polyacrylamide gel. Before loading, samples were heated at 95°C for 2-3 min in sample buffer under reduced conditions. Size standards consisted of pre-stained protein markers from New England Biolabs (Beverly, MA). Contents of the gel were transferred to a polyvinylidene difluo-ride membrane. After transfer, membranes were blocked, for 30 min, in 5% milk TBST. Primary antibodies were used at a 1:1000 dilution, in 2.5% milk TBST, and incubated overnight. After washing, the membrane was exposed to a horseradish peroxidase-conjugated goat anti-mouse secondary antibody for 45 min. Signal was detected using the chemiluminescent system from Amersham Biosciences. Primary antibodies used were mouse monoclonals against cathepsin D (Cell Signaling), cathepsin B (Zymed Laboratories Inc.), cathepsin L (Zymed Laboratories Inc.), and ␤-actin (Sigma). The secondary antibody consisted of a horseradish peroxidase-conjugated goat anti-mouse IgG antibody (Pierce). Signal was visualized with the ECL system (GE Healthcare) and exposure to x-ray film.
HLA-DR3 cDNA-The cDNAs for the ␣ (DRA*010202) and ␤ chains (DRB1*030101) of HLA-DR3 were generously provided by Dr. N. Koch (University of Bonn) and by Dr. C. Katovich-Hurley (Georgetown University). Sequencing of the ␤ chain confirmed the presence of arginine at position ␤74. Individual amino acids of the peptide binding cleft at the ␤ chain were changed, as indicated under "Results," using the QuickChange II or QuickChange multisite-directed mutagenesis kits (Stratagene, catalogue number 200523 and 200515), as per the manufacturer's instructions.
Anti-HLA-DR Antibodies-Hybridoma cell lines were purchased from the American Type Culture Collection (Manassas, VA). Clone HB-55 was used to make the monoclonal antibody L243, which specifically recognizes the DR␣ chain when it is correctly folded and complexed with its ␤ chain complement (free ␣ chains do not bind L243) (19). Hybridomas were grown in serum-free media (Thermo Scientific). Prior to use in experiments, IgG molecules were purified from the spent media, using protein G conjugated to agarose (Roche Applied Science). Concentration was determined by measuring the absorbance at 280 nm (1 A 280 nm unit ϭ 0.8 mg/ml immunoglobulin).
Cell Lines-Cell lines used in this study either expressed MHC II molecules endogenously or were stably transfected to express MHC II. VAVY cells (expressing HLA-DR3 B1*0301, DRA 0102, DQA0501, DQB10201, DPA10201, and DPB10101) were obtained from the European Collection of Cell Cultures. Rat2 cells, derived from rat embryonic kidney tissue, were purchased from the ATCC and were subsequently engineered to express HLA-DR3. Despite the absence of invariant chain and endogenous MHC II molecules, Rat2 fibroblast cells are fully capable of assembling functional MHC class II proteins (20).
Rat2 cells stably transfected with HLA-DR3 were made as follows. The cDNA of the ␣ chain was subcloned into the BamHI/XhoI sites of pCDNA 3.1ϩ vector (Invitrogen), containing the hygromycin resistance gene, and the cDNA of the ␤ chain was cloned into the BamHI/XhoI sites of pCDNA 3.1 ϩ vector containing the neomycin resistance gene. Cells were cotransfected with a total of 0.5 g of the DR␣ and DR␤ chain cDNA constructs, using Lipofectamine 2000 (Invitrogen), and grown in Dulbecco's modified Eagle's medium containing 10% fetal calf serum supplemented with high glucose, 10 mM L-glutamine, pyridoxine-HCl, 300 g/ml neomycin, and 50 g/ml hygromycin for selection of double-positive clones (DR␣ and DR␤). After several passages, cells were analyzed for DR expression using the fluorescence-activated cell sorter and using the HLA-DR-specific monoclonal antibody, L243. Positive cells were sorted and expanded.
Peptide Binding Analysis by Flow Cytometry-Biotinylated peptides were tested for binding to Rat2 cells expressing recombinant HLA-DR, based on methodology reported previously (21). Fluorescein isothiocyanate-conjugated avidin D (Pierce) was used to detect the N-terminal biotinylated peptide. Amplification of the signal was achieved by incubating with 5 g/ml biotinylated avidin-D-specific antisera (Pierce). Control samples consisted of Rat2 cells not transfected with HLA-DR, as well as HLA-DR expressing Rat2 cells incubated without peptide. Phycoerythrin-conjugated L243 (BD Biosciences) was used to detect the presence of HLA-DR molecules on the cell surface. The isotype control consisted of phycoerythrin-conjugated mouse IgG2a, isotype control (BD Biosciences).
For all flow cytometry experiments, binding was measured on a Coulter Epics XL (Beckman Coulter, Miami, FL). Dead cells and debris were excluded from analysis by setting the appropriate threshold trigger on forward-angle light scatter. Log fluorescence was collected for fluorescein isothiocyanate using a 525 band pass filter and phycoerythrin using a 575 band pass filter. In all experiments, 10,000 gated events were collected. For data analysis, peptide binding was expressed as the number of cells that bound peptide/the number of HLA-DR3 expressing cells.
Isolation of MHC II Molecules from B-cell Lines-Fully processed and naturally occurring HLA-DR3 molecules were purified from the VAVY cell line based on the reported protocols (22,47). Briefly, 10 9 cells were lysed for 60 min at 4°C in 0.01 M Tris, 0.15 M NaCl, pH 7.4, containing 2% Triton X-100 and proteinase inhibitors. HLA-DR3 molecules were purified from clarified cell lysates by immunoaffinity chromatography using L243, which was covalently linked to protein A-agarose, using the protein A IgG orientation kit (Thermo Scientific). Protein content was evaluated by the Bio-Rad protein assay reagent, and purity was assessed by SDS-PAGE.
Construction of DR␣ and DR␤ Fusion Protein (Fig. 1)-We have used a method successfully employed for HLA-DR expression systems in SF9 cells. Briefly, to achieve heterodimer stability (23) in the absence of a transmembrane domain, the extracellular portion of the DRB1*0301 ␤ chain was fused to the coiled-coiled region of the basic leucine zipper domain of JunB, whereas the extracellular portion of the ␣ chain was fused to the coiled-coiled region of the basic leucine zipper omain of Fos. These dimerization motifs have been demonstrated to preferentially favor the creation of heterodimers versus homodimers (24). Empty HLA-DR3 molecules were constructed by expressing each DR chain separately in E. coli and then refolding them together. Expression and purification were based on previous reports (25,26). "Sticky end" PCR was employed to fuse the respective ␣ or ␤ ectodomain with its respective dimerization motif. 100 nanograms of the XhoI-linearized pcDNA3.1 construct containing the ␣ chain was used as a template with forward primer (5Ј-TGCCCTCGAGAAAAGAATCAAAGAAG-AACATGTGATC-3Ј, XhoI site is in boldface) and reverse primer (5Ј-TCCTCCTCCGTCGACTCCCTGAAAGTACA-GGTTCTCGTTCTCTGTAGTCTCTGGGAG-3Ј; SalI site is in boldface, and TEV protease cleavage site is underlined). The cDNA clone containing the FOS dimerization domain was purchased from the ATCC (clone number MGC-11074). PCR was performed, using the cDNA clone as a template, along with forward primer (5Ј-GGAGTCGACGGAGGAGGACTGACT-GATACACTCCAAGCG-3Ј; SalI site is in boldface) and reverse primer (5Ј-GCGGTTAATTAATCATGCATAATCT-GGAACATCATATGGATATCGGTGAGCTGCCAGGATG-AA-3Ј; PacI site is in boldface; stop codon is underlined, and hemagglutinin epitope tag is italicized). The amplicons of the DR␣ ectodomain and the FOS dimerization domain were joined together by sticky end PCR, and the amplicon was then subcloned into the XhoI/PacI sites of pKLAC1 (New England Biolabs, Beverly, MA).
For construction of the ␤ chain construct, 100 ng of XhoIlinearized pcDNA3.1 containing the DR␤ construct were used as a template with forward primer (5Ј-CTGCCCTCGAGAA-AAGAGGGGACACCAGACCACGTTTC-3Ј; XhoI site is in boldface) and reverse primer (5-TCCTCCTCCGTCGAC-TCCCTGAAAGTACAGGTTCATCTTGCTCTGTGCAGA-TTC-3Ј; SalI site is in boldface, and TEV protease cleavage site is underlined). The cDNA clone containing the junB dimerization domain was purchased from OriGene (catalogue number TC118763, Rockville, MD) and was used as a template with forward primer (5Ј-GGAGTCGACGGAGGAGGACGCATC-GCGCGCCTGGAGGAC-3Ј; SalI site is in boldface) and 250 nM reverse primer (5Ј-GCGGTTAATTAATCAGTGGTGGT-GGTGGTGGTGGTTCATGACTTTCTGCTT-3Ј; PacI site is in boldface; the stop codon is italicized, and the histidine tag is underlined). As with the case of the DR␣ construct, the amplicons of the DR␤ ectodomain and the JunB dimerization domain were joined together by sticky end PCR. The amplicon was then subcloned into the XhoI/PacI sites of pKLAC1.
Solubilization and Purification of the Bacterially Expressed HLA-DR3-As the protein was expressed mainly in inclusion bodies, a scheme based on a protocol reported previously (25) for the refolding of HLA-DR from E. coli was employed. Initially, the pellet containing either the ␣ or ␤ chain was thawed in PBS, with 1 mM EDTA, 1 mM phenylmethylsulfonyl fluoride (Sigma), and protease inhibitors (Roche Applied Science). After resuspension and incubation on ice for 5 min, the sample was centrifuged at 10,000 rpm for 20 min in a Sorvall RC-5b centrifuge using a sm-24 rotor. The soluble fraction was discarded, and the insoluble pellet was washed twice with isopropyl alcohol and centrifuged, as in the previous step. The washed pellet was then solubilized, and the His-tagged protein was purified, as per manufacturer's instructions (Novagen, His-Bind kit manual), under denaturing conditions, using 6 M guanidinium hydrochloride (Sigma).
Refolding of the Bacterially Expressed HLA-DR3-To form a complete heterodimeric MHC II molecule, 5.5 mg of the ␤ chain was combined with 5.5 mg of the ␣ chain in 25 ml of 6 M guanidinium HCl/elute buffer (elute buffer made as described in the His-Bind kit). Immediately, the volume was brought up to 200 ml with dialysis buffer (20 mM Tris-HCl, pH 8.0 (Sigma), 1 mM EDTA (Sigma), 3 mM glutathione-reduced (Sigma), 0.3 mM glutathione-oxidized (Sigma), 50 mM KCl (Sigma), and 33% glycerol (catalogue number 49767, Sigma)). The recombined ␣ and ␤ chains were dialyzed against 3 liters of the dialysis buffer, with three changes being performed over the course of 72 h. All dialysis steps used a membrane of 15,000 molecular weight cutoff (Spectra/Por). Refolded protein was isolated by loading the recombined protein on to an Anx-Sepharose column (GE Healthcare) and running a gradient, which started at 50 mM KCl and ended at 1 M KCl in 20 mM Tris, pH 8.0, 10% glycerol. A final dialysis was performed against 10 mM Tris, pH 8.0, 25 mM KCl, 10% glycerol. The concentration of the bacterially expressed MHC II was determined as described for thyroglobulin.
ELISA-Plates were coated overnight at 4°C with 10 g/ml L243 in bicarbonate buffer, pH 9.5, in a 100-l volume. Solutions with limiting amounts (96 nM) of recombinant HLA-DR3 were challenged with 5 M biotinylated peptide for 24 -48 h, at 37°C, in a buffer consisting of PBS, 0.05% Triton X-100, and 1% bovine serum albumin. Controls consisted of just peptide, incubated under identical conditions, with no MHC II, to account for any nonspecific binding of the peptide to the ELISA plate. Subsequent to blocking with 200 l of 2.5% bovine serum albumin/PBS, at 37°C for 1 h, and four rounds of washing with PBS, 100 l of peptide/HLA-DR3 mixtures were bound in individual wells of 96-well plates. After incubation, for 1 h at 37°C, wells were washed 4 -5 times with PBS, 0.05% Triton X-100. Next, 100 l of ImmunoPure streptavidin-alkaline phosphatase conjugate (Pierce) was added to each well, in a 1:2500 dilution (0.25 units/ml), and incubated at 37°C for 30 min. Wells were then washed six times with PBS. Bound streptavidin alkaline phosphatase was visualized through the addition of p-nitrophenyl phosphate (Sigma) and measuring A 405 nm . To quantitate the femtomoles of peptide bound, a calibration curve was constructed with known amounts of streptavidin-horseradish peroxidase. As each mg of streptavidin-horseradish peroxidase can bind to 4 g of biotin (according to the product information provided by the manufacturer), the amount of biotin and hence the amount of peptide binding could be quantified.
Competition Experiment-For competition experiments, ELISAs were done as above, with the following modifications. Plates were coated with 20 g/ml L243, and 4 g/ml streptavi-din/europium (PerkinElmer Life Sciences) was used to detect HLA-DR-bound peptide. After washing the europium conjugate with PBS, 200 l of inducer solution was added, and finally the fluorescence was read using a POLARstar Omega fluorescent plate reader (BMG LabTech). Biotinylated apopeptide (previously shown to be a peptide that bound with high affinity to HLA-DR3), bound at 5 M concentration, was competed off with increasing levels of "cold" (nonbiotinylated) competitor peptides.
MD Simulations and Energetic Analysis-The complexes of DR3 with the three Tg peptides (Tg.1571, Tg.726, and Tg.1951) have been constructed with AMBER using the known structure of DR3 with the CLIP peptide (Protein Data Bank code 1A6A). The register of the peptides in the complex was determined by a multiple alignment of the peptides that bind to DR3. The registers are as follows: Tg.1571-Pro-Glu-Ser-Lys-Val 1 -Ile-Phe-Asp 4 -Ala-Asn 6 -Ala-Pro-Val 9 -Ala-Val, Tg.726-Pro-Thr-Pro-Cus-Gln 1 -Leu-Gln-Ala 4 -Glu-Gln 6 -Ala-Phe-Leu 9 -Arg-Thr, and Tg.1951-Arg-Lys-Lys-Val-Ile 1 -Leu-Glu-Asp 4 -Lys-Val 6 -Lys-Asn-Phe 9 . The corresponding residues in the CLIP peptide were substituted with the sequence of the respective peptide. A similar approach was used to construct the DR74Q mutant of DR3. Details of the simulations and MM-PBSA analysis are presented in the supplemental material.

Expression of Individual ␣ and ␤ Chains in E. coli-Individ-
ual HLA-DR3 chains were expressed as described under "Experimental Procedures" (Fig. 1). Fig. 2A shows the high expression levels of each chain, after induction in E. coli. Fig. 2B demonstrates that an intact ␣/␤ heterodimer is formed after refolding.
Cathepsin Expression in the Human Thyroid Gland-Western blot analyses established that cathepsins B, D, and L were well expressed in human thyroidal tissue (Fig. 3) at the protein level. Interestingly, most of the cathepsin D in the thyroid exists in the pro-form, suggesting that the activity of this protein is tightly regulated. Similar results have been reported in murine thyroid (29) as well. In contrast, cathepsins B and L were expressed mostly in their active forms.
Cathepsin-mediated Digestion of Thyroglobulin-Purified thyroglobulin ran with a predominant band at 330 kDa, on an SDS gel, under reducing conditions (Fig. 4, lane 2). Incubation  DECEMBER 4, 2009 • VOLUME 284 • NUMBER 49 with cathepsin B, D, or L, under acidic conditions (pH 3.5) mimicking the lysosomal milieu, resulted in substantial degradation of Tg (Fig. 4, lane 3). Sequences (118 peptides for cathepsin B, 171 peptides for cathepsin D, and 48 peptides for cathepsin L) of individual peptides generated from cathepsin-digested   5-8)). Actin levels served to confirm that equal amounts of protein were added to each lane. Tg were determined by high pressure liquid chromatographytandem MS analysis (supplemental Tables S1-S3) and were consistent across the 20 different thyroglobulin samples. The three cathepsins exhibited different cut site preferences, as leucine and phenylalanine were the two most common amino acids used by cathepsin B and D, whereas glycine and arginine were the cathepsin L-preferred amino acids (supplemental Fig. 1).

Autoimmune Thyroid Disease Susceptibility Genes
Correlation between Tg SNPs and Cathepsin-generated Tg Peptide Profiles-Mechanistically, a Tg amino acid variant could affect the Tg peptidic repertoire if SNP either adds or removes a cathepsin cut site. Our analysis of the peptide profiles generated by cathepsin B, D, and L digestion of Tg obtained from individuals harboring the three disease-associated Tg amino acid variants (10) did not demonstrate a correlation between any Tg amino acid variant or haplotype and levels of cathepsin B-, D-, or L-generated peptides (data not shown). Hence, these susceptibility polymorphisms may contribute to thyroid autoimmunity through mechanisms independent of cathepsin B, D, or L action (e.g. they could modify Tg entry into endosomes or peptide binding to MHC class II).
Identifying Potential Peptides for Binding Studies-We took a reverse immunological approach using two peptide-MHC II binding prediction algorithms: ProPred and RankPep. From the entire list of cathepsin-generated peptides, four gave high probabilities of binding to Arg-74 ϩ HLA-DR3 by both algorithms and were tested. All experimentally tested peptide sequences, along with those peptides previously reported to bind to HLA-DR3 (17,31), are summarized in Table 1.
Peptide Binding Studies and the Influence of the DR3␤ Pocket Signature-The recombinant MHC II protein was fully functional, as evinced by its ability to bind peptides even better than HLA-DR3, which was isolated from a DR3 homozygous cell line (VAVY from ATCC) (data not shown). Previously, our genetic studies have defined residues Gln-70, Lys-71, and Arg-74 of the DR␤1 chain as being highly associated with the development of AITD, whereas residues Arg-70, Arg-71, and Gln-74 were protective (30). Hence, three distinct signatures were examined as follows: the first recombinant molecule harbored the susceptible amino acids at all three positions Gln-70, Lys-71, and Arg-74; the second DR3 contained glutamine at position ␤74 (protective) together with Gln-70 and Lys-71 (susceptible, this construct enabled us to dissect the individual contribution of position 74 to peptide binding); the third molecule had a completely protective signature with Arg-70, Arg-71, and Gln-74. The amount of each peptide that could be bound by an equivalent amount of each signature MHC II molecule was determined.
The experimental peptides were compared with a previously identified peptide, apo (31), which has strong and specific affinity of binding to Arg-74 ϩ HLA-DR3 (positive control), as well as to two thyroglobulin peptides Tg.2673 and Tg.2507, predicted to be poor binders (negative controls). In addition, we tested two Tg peptides previously reported to bind DR3, Tg.726, and Tg.2098 (17). Consistent with the output of the MHC II binding algorithms, Tg.2673 and Tg.2507 did not bind (data not shown), whereas two of the predicted binders (Tg.1571 and Tg.1951) showed strong and specific binding when tested (Table 2). Binding data are summarized in Table 2, which represents the femtomoles of each bound peptide to a constant amount of HLA-DR3. The order of binding affinity was as follows: apo Ͼ Tg.1571 Ͼ Tg.726 Ͼ Tg.1951 Ͼ Tg.2098. It is noteworthy that Tg.2100, despite its similarity to Tg.2098, did not bind, possibly because the two amino acid N-terminal truncation places the biotin group much close to the peptidebinding cleft and thus sterically prevents binding. Intriguingly, as the character of the peptide binding cleft shifts from the susceptible to the protective signature, there is a marked diminution in peptide binding. Moreover, ϳ90% decreased binding was seen when replacing Arg-74 with Gln-74, whereas replac-  ing the other two susceptible amino acids at positions 70 and 71 decreased the binding by an additional 10%. These data demonstrate that Arg-74 contributed the most significant contribution to the energetics of binding and are consistent with our data that Arg-74 is strongly associated with both Graves and Hashimoto diseases (6,10). Specificity of Binding/Inhibition Studies-To demonstrate that the observed peptide binding was specific to the peptide binding cleft of the MHC II molecule, we conducted competition studies (Fig. 5) in which a biotinylated apoprotein (known to bind to the MHC II cleft) was competed off with a cold peptide of interest. The apopeptide was chosen, as it is a well characterized peptide that was initially identified after being eluted from HLA-DR3 isolated from the surface of B-cell derived cell lines (31), and it is thus suited to be "benchmark" peptide in this study. Tg.2098, Tg.726, Tg.1571, and Tg.1951 all competed with apo for the binding to the MHC II peptide binding cleft (Fig. 5), demonstrating that the binding of these peptides was specific to the HLA-DR peptide binding cleft.
Cell Lines-As a complement to the recombinant HLA-DR3 system, an HLA-DR3-Arg-␤74-expressing Rat2 cell line was constructed to examine the specific Tg peptide binding to MHC II without interference from any other endogenous MHC II molecules (which may occur if one were to use immortalized B-cell lines from DR3 homozygous individuals that also express other MHC II molecules such as DQ). The DR3 stably transfected line showed significant expression of the HLA-DR molecules (Fig. 6A). Moreover, all of the peptides that were observed to bind in the recombinant system were validated and showed strong binding to the HLA-DR3-expressing cell line while not binding to the non-MHCII-expressing parent Rat2 cell line (Fig. 6B).
Analysis of the Binding Energetics of the Tg Peptides-Focusing on the nine critical pocket residues (P1-P9) that are most important for peptide MHC II interactions, we undertook an amino acid by amino acid analysis to dissect the energetics of the thyroglobulin peptide binding. Table 3 lists the energies derived from each peptide residue side chain and predicts an optimal HLA-DR3 binding sequence derived from Tg.1571, Tg.1951, and Tg.726. Clearly, all the residues contribute to binding, albeit to a different extent. For example, in all the peptides, contributions from position 5 are the smallest (Table 3), although positions 1 and 9 make major contributions. Positions 1, 6, and 9 prefer a hydrophobic residue and position 4 a negatively charged one. Positively charged residues in the center of the sequence are clearly disfavored (e.g. Tg.1951). Further decomposition of the interaction energy into contributions from backbone and side chains reveals that interactions with the backbone contribute a substantial portion of the interaction energy. In Tg.1571, Tg.726, and Tg.1951, interactions with backbone contribute 37, 38, and 54%, respectively. These contributions are from hydrogen bonds to the peptide CϭO and N-H groups, and thus they depend weakly on the specific position as indicated by their average and small variance compared with the side chains. In Tg.1571, average backbone interactions are Ϫ1.9 Ϯ 0.7 kcal/ mol, whereas for the side chain interactions are Ϫ2.6 Ϯ 1.7 kcal/mol. Clearly, the selectivity of peptide binding resides in the interaction with the side chains. This is supported by the fact that the "optimal" sequence constructed from side chain contributions is the same as that from the total interaction energies (data not shown and see Table 3).
The interaction energies between the peptide and the HLA can be also decomposed into pairwise contributions (27). Such an analysis identifies the specific residues in the peptides and in the MHC protein that make significant contribution to the interaction energy. The analysis is shown in Fig. 7A. The specific values of pairwise interactions for different peptides depend on the sequence of the peptide, but the localization of the interactions is invariant (Fig. 7A). With the exception of positions 4 and 5 in the peptide, the binding pockets are made up of residues in the ␣ and ␤ subunits of the MHC protein.
Interestingly, the pockets for residues 4 and 5 are entirely localized in the ␤ subunit, and the strongest interactions are centered on Lys-␤71 and Arg-␤74. In Tg.1571, this is by far the most important pairwise interaction, providing a molecular mechanism for the observed changes in binding upon mutations of Arg-74 and Lys-71.
As shown pictorially for Tg.1571 in Fig. 7A, the most favorable interactions for Tg.1571 occur on the ␤ chain and cluster around positions 71 and 74, which we have previously identified as strategic to the development of AITD (30). Our analysis determined that a negatively charged residue in position 4 or 5 of the peptide is responsible for the interaction with the positive residues in positions 71 and 74 of the HLA-DR cleft. Fig. 7B shows Tg.726 modeled within the binding cleft and highlights the loss of contacts when Arg-␤74 is changed to Gln-␤74.
Importance of Arg-␤74-To explore the role of Arg-␤74 in peptide binding, we conducted the same simulations with the three peptides interacting with DR3 in which Arg-␤74 was mutated to Gln. The mutation R74Q reduces the total interaction energy with all the peptides with the following rank order: Tg.1571 Ͼ Tg.1951 Ͼ Tg.726. A more detailed analysis showed that the mutation affects primarily the interaction with the side chains as follows: 3.5 kcal/mol in Tg.1571, 3.0 kcal/mol in Tg.726, and 4.6 kcal/mol in Tg.1951. Consequently, the relative contribution of the interaction with the backbone increases and becomes a dominant contribution. A decomposition of the total interaction energy into residue-based contributions shows that the main source of the change lies in the negatively charged residues in the center of the peptide. Thus, the interaction with D4 in Tg.1571 is reduced by 3.0 kcal/mol, the interaction with E5 in Tg.726 is reduced by 1.0 kcal/mol, and in Tg.1951 the interaction with both E3 and D4 is reduced by 3.5 kcal/mol. These changes are consistent with the experimental findings and provide a molecular model for the observed changes in binding. An illustration of the changes in the interactions is shown in Fig. 7B, which shows the molecular arrangement around Lys-71 and Arg-74/ Gln-74 in the complex with Tg.726. It shows a strong interaction between the two positive residues and the carboxylate of D4. In addition, Lys-71 interacts with the backbone carbonyl of A5. In Gln-74, the configuration does not change substantially compared with Arg-74, but now the interaction with Gln-74 is substantially weaker as reflected in Binding is expressed as the percentage of cells that capture a given peptide of all DR3-positive cells. Tg.2673 and Tg.2507 are Tg peptides predicted not to bind and are used as negative controls (these peptides showed no detectable binding in our ELISA system; data not shown). Tg5-22 (DQVAALTWVQTHIRGFGGDPRR), which bound nonspecifically, was based on a previously reported pathogenic Tg-derived peptide (44) and was used to demonstrate the sensitivity of our system to nonspecific binding.  the interaction energy as well as in the binding results. These analyses clearly illustrate that the effect of substitution in the HLA-DR pocket has both direct and indirect effects on peptide binding and selectivity.

DISCUSSION
Any given MHC II molecule has the ability to bind between 650 and 2000 different peptides (the peptidic repertoire) (31)(32)(33); nevertheless, despite this promiscuity, the ability to bind individual peptides with great stability is retained (reviewed in Ref. 34). In addition to being designed to present exogenous peptides and thus protect the host, the MHC II, by virtue of its exquisite design, can also present numerous self-peptides and, if autoreactive T-cells escape tolerance, cause an autoimmune response.
Critical to the initiation of an autoimmune response is the sequence of the MHC II peptide binding cleft. Indeed, concerning thyroid autoimmunity, we have previously identified unique amino acid pocket signatures in HLA-DR, which are highly associated with both GD and HT (9,30). Moreover, when co-inherited with disease-associated polymorphisms in the thyroglobulin gene, a statistical interaction arises, resulting in a significantly higher risk for disease than would be expected by a simple additive effect alone (7), suggesting the existence of a biochemical interaction between Tg peptides and the MHC II pocket signature (35). Consistent with this putative mechanism are data showing that thyroid follicular cells can express MHC II molecules in the context of autoimmunity, and can act as facultative APCs (36,37).
Interestingly, the MHC II pocket sequences examined in this study dictated a dramatic effect on the binding of multiple Tg peptides, as well as apo to HLA-DR (Table 2). Three pocket signatures were tested using HLA-DR3 molecules from a recombinant E. coli system, which was validated by both specificity controls and by a stable MHC-II-expressing cell line. The pocket signatures were as follows: 1) Tyr-26␤, Tyr-30␤, Gln-70␤, Lys-71␤, and Arg-74␤ (all susceptible amino acids); 2) Tyr-26␤, Tyr-30␤, Gln-70␤, Lys-71␤, and Gln-74␤ (all susceptible amino acids except Gln-74␤); and 3) (Tyr-26␤, Tyr-30␤, Arg-70␤, Arg-71␤, and Gln-74␤ (all protective amino acids, except Tyr-26␤ and Tyr-30␤). These pocket signatures were selected based on our genetic studies that have shown that GD is associated with only one amino acid pocket variant DR␤-Arg-74, whereas HT requires a five-amino acid signature, Tyr-26␤, Tyr-30␤, Gln-70␤, Lys-71␤, and Arg-74␤. Lys-71␤ showed by logistic regression analysis to be the most strongly associated amino acid (30). Because Arg-74 was common to both GD and HT susceptibility pockets, we hypothesized that it was critical for peptide binding. Therefore, to test whether Arg-74 has the greatest contribution to peptide binding, we created a hybrid pocket containing susceptible residues at all positions, except 74 which had Gln (resistant). Our results demonstrated that Arg-74 was the key position for peptide binding. As corroboration to the results of the binding studies, energetic analysis indicated that most of the favorable peptide MHC II interactions are promoted by susceptibility amino acids at positions 71 and 74 of the ␤ chain. The energetic data showed that Arg-74 forms a strong salt bridge interaction with residue Glu-5 of the peptide and induced a strained conformation of the side chain. In contrast, when Gln occupies position 74 it weakens the interaction with Glu-5 and releases the strain of the side chain away from Gln-74. Thus, Arg-74 is critical to stabilizing the Tg peptide within the DR pocket.
The same positions, located in pocket P4, which were identified to be strategic to peptide binding in the recombinant system (␤70, ␤71, and ␤74), have also been found to play an important role in other autoimmune diseases (38) and immunological processes. For example, positions ␤70 and ␤71 are critical to the DR4 alleles associated with rheumatoid arthritis (39). Additionally, ␤Q70R and ␤R74Q were associated with elevated CLIP levels in cells, as they serve to increase the retention of CLIP by DR3 (40), and ␤Q70R has been implicated in a TCR contact (41).
From the standpoint of the peptide, this study has demonstrated that at least four thyroglobulin-derived peptides have a biological relevance to thyroid autoimmunity. Two (Tg.726 and Tg.2098) have been characterized previously as coming from human thyroids and were shown to bind to HLA-DR3, using methodology from ProImmune Ltd. (17). The two novel Tgderived peptides that have been identified, although not coming from isolated glands, are biologically relevant nevertheless, as they were derived from cathepsins, which are the main proteases generating peptides in endosomes and are key proteases in thyroid biology. Traditional methods used to find pathogenic self-peptides, such as those applied to the acetylcholine receptor in the studies of myasthenia gravis (42), which consist of sequentially generating random peptides, are unsatisfactory, as, in addition to being very labor intensive, they may not yield peptides that have biological relevance to the etiopathogenesis of autoimmunity. Sequentially generated peptides may not be produced under in vivo conditions. By constructing a model Autoimmune Thyroid Disease Susceptibility Genes DECEMBER 4, 2009 • VOLUME 284 • NUMBER 49 peptidic repertoire and scanning it with validated algorithms, we have shown that it is possible to identify, with a high probability (two out of four tested), MHC II binders, without synthesizing and screening a large number of peptides. In fact, to screen the entire Tg molecule with sequentially generated random peptides would require the screening of Ͼ2,500 peptides, which is not feasible. Moreover, by furnishing a detailed analysis that documents the model peptidic repertoire that will be available in the thyroid, as a result of cathepsin B, D, or L digestion, and understanding the cutting sites, it is possible to trace the origins of a pathogenic peptide. As proof of principle, four of the eight peptides isolated from the HLA-DR of thyroids from patients with GD (17) had a very closely corresponding peptide (sequence similarities are underlined in Table 4) in the model peptide repertoire. It remains to be determined whether the novel Tg-derived HLA-DR3-binding peptides identified in this study, as well as the previously identified thyroglobulin peptides (17), are immunodominant, subdominant, cryptic, or simply self-peptides. Importantly, of all the evaluated Tg-derived HLA-DR3binding peptides, Tg.2098 is especially interesting as it was detected using our recombinant system, as well as by two other groups; the first group used thyroid tissues from patients with Graves disease (17), and the second group conducted peptide immunizations in mice (45). Hence, these data suggest that Tg.2098 could possibly play a role in the initiation of AITD.
Finally, although the analyses in this study examined Tg digested by a single cathepsin at a time, all three proteases could work in concert, in an in vivo context. Moreover, any disease or physiological process that affects the level of a given cathepsin would help to shape the character of the thyroglobulin peptidic repertoire. For example, increased cathepsin B immunoreactivity on the apical membrane of thyroid follicular cells has been shown in Graves disease (13). Consistent with this finding, TSH has been reported to induce a 2-fold enhancement of cysteine protease activities (14). Finally, cathepsins B and L are up-regulated during conditions conducive to inflammation such as exposure to interferon-␥ (43). Because there is a relationship between the catabolic profile of thyroglobulin and actual peptides that are ultimately processed and displayed on thyroidal cells, cathepsin or lysosomal protease inhibitors, directed to the thyroid, might be a novel approach in which to mitigate or prevent AITD. Indeed, a proof of principle of this approach to treating autoimmunity was provided by a knock-out mouse model lacking cathepsin L on a non-obese diabetic mouse background that was protected from developing autoimmune diabetes (46).