Common allotypes of ER aminopeptidase 1 have substrate-dependent and highly variable enzymatic properties

Polymorphic variation of immune system proteins can drive variability of individual immune responses. Endoplasmic reticulum aminopeptidase 1 (ERAP1) generates antigenic peptides for presentation by major histocompatibility complex class I molecules. Coding SNPs in ERAP1 have been associated with predisposition to inflammatory rheumatic disease and shown to affect functional properties of the enzyme, but the interplay between combinations of these SNPs as they exist in allotypes has not been thoroughly explored. We used phased genotype data to estimate ERAP1 allotype frequency in 2504 individuals across five major human populations, generated highly pure recombinant enzymes corresponding to the ten most common ERAP1 allotypes, and systematically characterized their in vitro enzymatic properties. We find that ERAP1 allotypes possess a wide range of enzymatic activities, up to 60-fold, whose ranking is substrate dependent. Strikingly, allotype 10, previously associated with Behçet’s disease, is consistently a low-activity outlier, suggesting that a significant percentage of individuals carry a subactive ERAP1 gene. Enzymatic analysis revealed that ERAP1 allotypes can differ in both catalytic efficiency and substrate affinity, differences that can change intermediate accumulation in multistep trimming reactions. Alterations in efficacy of an allosteric inhibitor that targets the regulatory site suggest that allotypic variation influences the communication between the regulatory and the active site. Our work defines the wide landscape of ERAP1 activity in human populations and demonstrates how common allotypes can induce substrate-dependent variability in antigen processing, thus contributing, in synergy with major histocompatibility complex haplotypes, to immune response variability and predisposition to chronic inflammatory conditions.

Polymorphic variation of immune system proteins can drive variability of individual immune responses. Endoplasmic reticulum aminopeptidase 1 (ERAP1) generates antigenic peptides for presentation by major histocompatibility complex class I molecules. Coding SNPs in ERAP1 have been associated with predisposition to inflammatory rheumatic disease and shown to affect functional properties of the enzyme, but the interplay between combinations of these SNPs as they exist in allotypes has not been thoroughly explored. We used phased genotype data to estimate ERAP1 allotype frequency in 2504 individuals across five major human populations, generated highly pure recombinant enzymes corresponding to the ten most common ERAP1 allotypes, and systematically characterized their in vitro enzymatic properties. We find that ERAP1 allotypes possess a wide range of enzymatic activities, up to 60-fold, whose ranking is substrate dependent. Strikingly, allotype 10, previously associated with Behçet's disease, is consistently a low-activity outlier, suggesting that a significant percentage of individuals carry a subactive ERAP1 gene. Enzymatic analysis revealed that ERAP1 allotypes can differ in both catalytic efficiency and substrate affinity, differences that can change intermediate accumulation in multistep trimming reactions. Alterations in efficacy of an allosteric inhibitor that targets the regulatory site suggest that allotypic variation influences the communication between the regulatory and the active site. Our work defines the wide landscape of ERAP1 activity in human populations and demonstrates how common allotypes can induce substrate-dependent variability in antigen processing, thus contributing, in synergy with major histocompatibility complex haplotypes, to immune response variability and predisposition to chronic inflammatory conditions.
Major histocompatibility complex (MHC) molecules (human leukocyte antigens [HLAs] in humans) are the most polymorphic human genes with tens of thousands of different allomorphs identified to date (1). MHC class I (MHCI) molecules bind small protein fragments (peptides) that originate from normal cellular proteins or pathogen proteins and then translocate to the cell surface to present their cargo to cytotoxic T-lymphocytes (2). Polymorphic variation in MHCI predominantly affects the structure of the binding groove and allows the presentation of a large variety of peptide sequences.
MHCIs bind their peptide cargo in the endoplasmic reticulum (ER) with the assistance of the peptide-loading complex (3). While MHCI tend to bind peptides that are between 8 and 11 amino acids long (the majority of which are 9mers), many peptides that enter the ER can be substantially longer (4). Two ER-resident aminopeptidases, ER aminopeptidase 1 and ER aminopeptidase 2 (ERAP1 and ERAP2), catalytically process these precursor peptides and define the peptide pool that is available for binding onto MHCI (5).
The ERAP1 gene is also polymorphic, and a variety of coding SNPs confer susceptibility to human disease, most notably chronic inflammatory conditions, often in epistasis with HLA class I alleles, which emphasize the critical role of ERAP1 in antigen presentation (6)(7)(8)(9). The genetic association of inflammatory diseases, such as HLA-B27-associated ankylosing spondylitis (AS), HLA-B51-associated Behçet's disease, and HLA-A29-associated birdshot uveitis, led to the hypothesis that these conditions are driven by pathogenic changes in antigen presentation as a direct result of alterations in substrate preferences or activity of ERAP1 (10)(11)(12)(13). Several ERAP1 SNPs have been described to affect the function of the enzyme (14,15). Mechanisms proposed to underlie this effect include direct interactions with the substrate (16), effects on conformational dynamics (17), protein expression level (18,19), or combinations of these (10). However, not all the possible combinations of the nine most common coding SNPs (i.e., allotypes) (20) occur at equal frequency in the human population (9). Rather, these SNPs encode a limited palette of allotypes that are maintained at high frequency (>1%) in populations, which suggests functional asymmetry between ERAP1 allotypes. This is supported by the fact that some ERAP1 allotypes are protective, whereas others confer risk to inflammatory diseases (21). A deep understanding of the functional properties of ERAP1 allotypes rather than individual SNPs is critical to unraveling their physiological impact on disease.
Previous studies have described several ERAP1 allotypes. Most studies defined ERAP1 allotypes as the combination of nine coding SNPs at amino acid positions 56, 127, 276, 346, 349, 528, 575, 725, and 730. Ombrello et al. (9) reported 10 common ERAP1 allotypes in three populations of European and East Asian Ancestry (n = 160). Reeves et al. reported some additional distinct allotypes discovered in small patient cohorts of AS (n = 72) (15) and oropharyngeal squamous cell carcinoma (n = 25) (22). These allotypes however have been controversial and proposed to be rare by others (23,24). While these studies have contributed to our understanding of the emerging role of ERAP1 allotypes in disease, controversy remains on which ERAP1 allotypes are common, and systematic analysis of their functional differences has been lacking.
We used phased genotype data from the 1000 Genomes Project (25) to define common ERAP1 allotypes in 2504 individuals of five major human populations. We generated recombinant versions of the ten most common allotypes and comprehensively characterized their in vitro enzymatic properties. We find a complex landscape of large substratedependent enzymatic activity differences between allotypes because of effects on catalytic efficiency and substrate affinity. Our findings suggest that ERAP1 allotypic variation has the potential to strongly synergize with MHCI alleles, in an epitope-dependent manner, to enhance immune system variability in natural human populations.

Results
Analysis of the Genome Aggregation Database that contains 125,748 exome sequences (26) using a 1% frequency cutoff to qualify a coding genomic missense variant as a polymorphism revealed only ten amino acid positions as polymorphic, namely 12,56,127,276,346,349,528,575,725, and 730, consistent with a previous study (9). Since however position 12 lies in the signal peptide that is normally excised after translocation of ERAP1 into the ER and thus does not appear in the mature protein, we focused our analysis on the remaining nine positions. These nine SNPs could be theoretically organized in up to 2 9 discreet allotypes. To define which ERAP1 allotypes are common in human populations, we exploited available phased (i.e., ordered along one chromosome) genotype data from 5q15 of the 1000 Genomes Project phase 3 (27). The frequency of the nine ERAP1 SNPs in different populations is shown in Table S1. Correlations between individual SNPs that indicate linkage disequilibrium are shown in Table S2 and are generally consistent with previous studies (9). The population frequency of the most common ERAP1 allotypes is shown in Table 1. An analogous analysis using data from the UK biobank revealed highly similar results (Table S3) (28). Strikingly, although ten common allotypes constitute 99.9% of all allotypes in the European population, some populations have additional allotypes not reported before. Overall, we were able to identify at least six additional allotypes that have frequencies of over 0.5% in at least one population (Table 1; allotypes 11-16). Our analysis not only confirmed previous results in a larger setting but also revealed that there is significant population variability between ERAP1 allotypes. Regardless, given the extensive use of allotypes 1 to 10 in the literature and their near-complete coverage of the global population (>94% globally, >99.9% in the European population), we proceeded with the functional characterization of these allotypes. Since individuals carry two copies of the ERAP1 gene, we also analyzed the combinations of allotypes present in the 2504 samples ( Fig. 1A and Tables S4 and S5). The most common combination was that of allotypes 8 and 2 followed by the 2-2 homozygous, which cumulatively account for almost 20% of the global population. Interestingly, the combination 8-2 was found to be about twice as frequent (11%) than predicted from a random distribution (25.6% × 21.8% = 5.6%). We in addition analyzed the prevalence of two SNPs in the homologous ERAP2, namely rs2549782 and To characterize the enzymatic properties of the common ERAP1 allotypes, we generated recombinant ERAP1 variants corresponding to each allotype as listed in Table 1. The SNPs that define allotypes 1 to 10 are scattered throughout the structure of ERAP1, away from the catalytic center ( Fig. S1) and can be generally categorized into two groups: (a) SNPs that lie on the outside of the central cavity of the enzyme that normally accommodates the substrate (Fig. S1A) and (b) SNPs that line the interior surface of this cavity and may make direct interactions with the peptide substrate ( Fig. S1B).
We first characterized the ERAP1 allotypes using wellestablished small dipeptide substrates. The specific activity for the hydrolysis of the substrate L-leucine-7-amido-4methylcoumarin (Leu-AMC) is shown in Figure 1B. There was approximately a twofold spread in specific activities for allotypes 1 to 9, but allotype 10 was found to be at least 10-fold less active ( Fig. 1B and Fig. S2). The relationship between specific activity and substrate concentration was found to be linear up to 150 μM substrate, which allowed the calculation of the k cat /K M ratio for Leu-AMC ( Fig. 1C and Table S7). Allotype 3 was found to be the most active of all, and allotype 10 was 18-fold less active. To obtain full Michaelis-Menten (MM) analysis, we employed the chromogenic dipeptide substrate L-leucine-para-nitroanilide (Leu-pNA) (31). Data fit best to an allosteric MM model as previously demonstrated (31,32) allowing us to calculate the enzymatic parameters ( Fig. 1D and Table S7). This analysis demonstrated that the changes in specific activity between allotypes are both because of changes in affinity for the substrate (K half ) and to changes in maximal catalytic efficiency (V max ). Similar to Leu-AMC, allotype 10 was much less active in hydrolyzing Leu-pNA, which Enzymatic properties of common ERAP1 allotypes unfortunately precluded reliable calculation of K half and V max for this allotype. Since ERAP1 trims long N-terminally extended peptide precursors of antigenic peptides, we turned to more physiologically relevant longer peptides. Recently determined cocrystal structures of ERAP1 with bound 15mer and 10mer peptide analogs revealed that the peptides are processed in a large internal cavity, while making interactions with residues that line that cavity, which can drive selectivity (16). We measured the rate of N-terminus hydrolysis of two peptides of a similar backbone sequence as the cocrystallized analogs, namely the 15mer LLRIQRGPGRAFVTI and the 10mer LLKHHAFSFK (Fig. 2, A and B). Like the results with the small substrates, we recorded a significant variation in trimming rates. Although allotype 10 was again the least efficient, processing rate differences were less pronounced compared with smaller substrates. Notably, allotype 10 was about as efficient as allotype 9 in trimming the 15mer. Interestingly, the pattern between the two peptides was significantly different. Allotypes 4 and 6 trimmed the 15mer the fastest, whereas allotypes 5 and 7 trimmed the 10mer the fastest. These results are consistent with a complex landscape of peptide-enzyme interactions that drive specificity and suggest that the effect of allotype variation may be substrate dependent (16,33).
To better understand the mechanism behind the variation of trimming rates, we utilized a recently developed assay suitable for MM analysis that follows the trimming of a 9mer antigenic epitope with the sequence YTAFTIPSI ( Fig. 2C) (34). Using this assay, we calculated k cat and K M for each ERAP1 allotype (Table S8). The turnover rate (k cat ), a measure of the maximal catalytic rate of hydrolysis, varied by only about twofold between allotypes 1 and 9 but was about 10-fold reduced for allotype 10, indicating that this allotype is catalytically deficient. The Michaelis constant (K M ), a measure of how well the enzyme can recognize the substrate, varied between allotypes by up to sixfold with allotypes 1 and 2 having the highest affinity for the substrate. The ratio k cat /K M , a measure of the overall catalytic efficiency of the enzyme, varied up to 60-fold, indicating that changes in k cat and K M can synergize to enhance differences in trimming rates for particular substrates (Fig. 2D). Notably, allotype 10 had a 60fold lower catalytic efficiency compared with allotype 2 because of the combined effect of lower substrate affinity and lower turnover rate. Thus, we conclude that polymorphic variation in ERAP1 can influence peptide trimming rates by affecting both catalytic efficiency and substrate recognition.
ERAP1 trimming of antigenic epitope precursors in the ER often includes multiple trimming steps and which Enzymatic properties of common ERAP1 allotypes intermediates accumulate can affect which peptides will bind onto MHCI (35). To evaluate the effect of ERAP1 allotype variation on sequential trimming reactions, we followed the generation of the ovalbumin epitope SIINFEKL from the starting 14mer extended epitope GLEQLESIINFEKL (Fig. 3). In all cases, the 14mer was catabolized and the 8mer produced but with variable efficiencies. As before, allotype 10 was less active in trimming, and it was necessary to use at a 10-fold higher concentration to follow the reaction. In all reactions, all possible intermediates were detected (Fig. 3A). However, we observed significant differences both between intermediate accumulation and on the overall rate of mature epitope generation (Fig. 3B). Specifically, allotypes 1 to 3 accumulated the mature epitope the fastest, whereas allotypes 5 and 7 were less efficient, in part because of slower trimming of the initial 14mer. This comes as a sharp contrast to their ability to trim the 10mer peptide shown in Figure 3B. Interestingly, allotype 5 accumulated significant amounts of the 11mer intermediate.
Several other allotypes (4 and 6-10) accumulated the 12mer intermediate. Since peptides of 10 to 12 residues can be immunogenic, accumulation of distinct intermediates by different allotypes could contribute to differences in immune responses between individuals. ERAP1 is an emerging pharmacological target for cancer immunotherapy and the control of inflammatory autoimmunity, including rheumatic conditions such as AS (36,37). Given the wide distribution of common allotypes in the population, it is crucial to know if inhibitors with clinical potential can effectively inhibit all allotypes. We performed inhibition titrations using Leu-AMC and inhibitors DG013A and GSK849, both shown to be active in cellular assays. DG013A is a potent transition state analog that targets the active site of the enzyme (38). GSK849 targets a regulatory site of ERAP1 and while it is an activator for small substrates, it inhibits long peptide hydrolysis by interfering with C-terminus recognition (34). DG013A was able to inhibit all 10 allotypes with high potency Enzymatic properties of common ERAP1 allotypes (pIC 50 between 7.2 and 7.6) (Fig. 4A). GSK849 acted as an apparent activator of small substrate hydrolysis as previously reported (34), and its efficacy varied significantly between allotypes (pXC 50 between 4.8 and 6.5) (Fig. 4B). GSK849 was, however, an inhibitor of the more physiologically relevant 9mer substrate YTAFTIPSI (Fig. 4C). A comparison of the pIC 50 and pXC 50 values for the two substrates is shown in Figure 4D. GSK849 was most active against allotypes 1 and 2 and least active versus allotype 10. Overall, there was a positive correlation between allotype activity and the ability of GSK849 to inhibit (Fig. 4E). In addition, there was a positive correlation between the pXC 50 value and the fractional activation observed (Fig. 4F). This surprising finding suggests that the regulatory, but not the catalytic site of ERAP1, is sensitive to the allotypic state of the enzyme. The high variability in enzymatic activity between ERAP1 allotypes suggests that individuals carrying different combinations of allotypes can have an even larger variability of ERAP1 activities. Assuming no specific interactions between the two alleles, the total enzymatic activity of an individual would be expected to be the sum of the activities of each allele. A caveat in this analysis is that SNPs and therefore allotypes may affect gene expression or protein turnover, thus affecting the steady-state protein levels. Indeed, two recent reports suggested that SNPs can affect ERAP1 expression to some degree, which could either exacerbate or ameliorate allotype activity variation (18,19). Since however, existing studies were limited to effects of specific SNPs and not allotypes, and effects on expression levels were relatively small, the potential effect of altered levels of expression is not examined here. To calculate the expected total activity of the two alleles carried by individuals, we utilized the measurements of activity for the 9mer substrate since we had obtained reliable catalytic efficiency measurements (k cat /K M , Fig. 2D and Table S8). A plot of expected specific activity versus population frequency is shown in Figure 5A. A bubble chart showing the population frequency and expected enzymatic activity for each possible combination of ERAP1 allotype is shown in Figure 5B. Individuals carrying different common combinations of allotypes are expected to possess a wide range, of about 60-fold, of total ERAP1 activity. Most common allotype combinations fall within a more limited range, about 10-fold (cyan region, Fig. 5A). Homozygous individuals of allotype 2 are quite frequent in the global population and would be expected to feature the highest ERAP1 activity (red region, Fig. 5A). Combinations of 2 with 8 are also very common and have moderateto-high enzyme activity (red region, Fig. 5A). Several moderately active combinations of allotype 9 are rare in the population, as is the [4,4] homozygous. Finally, homozygous individuals of allotype 10 are found in 1.2% of the global population and should have the lowest ERAP1 activity, being functional knockouts for some peptide substrates. Significant variability in both activity and frequency distribution was found in different populations (Fig. S3), a phenomenon that may signify local host-pathogen balancing selection pressures, a notion previously suggested for individual SNPs (39).

Discussion
A wealth of genetic association studies has linked ERAP1 polymorphic variation to the existence of coding SNPs (7, 40), which have been reported to affect enzyme function, thus Enzymatic properties of common ERAP1 allotypes providing mechanistic support for observed epistasis between ERAP1 and HLA (14,15,22,29). In general, functional effects of single SNPs are relatively modest but complex and hint at possible synergism between SNPs that is poorly understood. This is highlighted when common allotypes harbor SNPs that protect along with SNPs that predispose to disease (21). Thus, a deep understanding of the complex patterns of disease susceptibility observed in genetic-association studies requires a detailed knowledge of the functional differences between naturally occurring allotypes and not just SNPs. Our results here suggest that SNPs can synergize to affect function in a substrate-dependent manner. This is probably best highlighted for allotype 10, which is up to 60-fold less active than other allotypes. Such a strong functional change has not been reported before for individual SNPs in ERAP1 and is probably the result of synergism. Specifically, allotype 10 carries the SNP 528R that has been reported to associate with decreased risk to AS and psoriasis and to reduce enzyme activity by about twofold (13,14,20,41). In addition, it carries the SNP 575N, which has been reported to affect enzymatic activity depending on the polymorphic state of position 528 (42). These two SNPs may synergize along with two additional SNPs unique to this allotype: (i) 349V, which is relatively close to the catalytic site although it is a conservative alteration that has not been reported to affect activity in isolation and (ii) 725Q, which has been demonstrated to reduce enzymatic activity (43) and lies in the interface between domains II and IV and could induce changes in the dynamics of the conformational change of ERAP1, similar to position 528 (44). It is possible that synergism between those four SNPs could underlie the greatly reduced catalytic efficiency of allotype 10. Strikingly, this reduction appears to be partially substrate dependent. Trimming of the 10mer substrate was less affected by this allotype compared with the other substrates we tested. This could be due to unproductive interactions between the C-terminal side chain of this peptide and 725R as observed in a recent crystal structure, making the substitution 725Q favorable for this particular substrate (16). This observation highlights a motif that can be seen throughout our results. While some changes in activity between allotypes are consistent, they can be influenced by the substrate. This is remarkably reminiscent of the effects of polymorphic variation in MHC molecules. Changes in the shape and dynamics of the peptide-binding groove affect the binding of different peptides both thermodynamically and kinetically and contribute to the variability of immune responses (45). It appears that ERAP1 allotypes operate in the same general principle as MHC haplotypes. They induce variability of substrate processing, in a substratedependent manner, thus contributing, in tandem with MHC polymorphic variation, to immune response variability within the population. While it is difficult to dissect the mechanism that underlies the role of each SNP to the activity of each allotype without additional structural information, some insight can still be extracted. Previous studies demonstrated that 528K results in higher enzymatic activity possibly because of effects on the conformational dynamics of the enzyme (44). Accordingly, allotypes 1 to 3, all of which carry this SNP, are among the most active in our assays. The polymorphism 730Q has been suggested to affect activity because of changes in interaction with the C-terminal moiety of the peptide (7,44). Accordingly, allotypes 1 and 2, which carry this SNP, are more active versus the 9mer substrate that carries a hydrophobic C-terminal side chain; allotypes 3 to 10 that have 730E at that location, a negative charge, would be expected to be worse in interacting with this substrate, all are less active compared with 1 and 2. Furthermore, allotypes that carry the 127P polymorphism (also reported to be associated with AS (21)) tend to have lower activity, although no specific effects of this SNP have been reported before (7). This polymorphic location lies on a putative substrate exit channel and the reduced structural flexibility in the mouth of this channel because of the proline residue may reduce the kinetics of product-substrate exchange leading to a slower apparent activity (16). Finally, the sensitivity of the regulatory site to inhibition appears to be allotype-dependent, a notion that is consistent with a previous Enzymatic properties of common ERAP1 allotypes study from our group that suggested that the regulatory site communicates with the active site through effects on the conformational dynamics of the enzyme (32). Further structural analysis of these common ERAP1 allotypes will be invaluable in dissecting synergism between SNPs that underlie functional changes in allotypes. Although ERAP2 has been described as an accessory aminopeptidase that supplements ERAP1 activity, recent genetic association data have pointed to important roles in both cancer immunotherapy and autoimmunity (46)(47)(48). Because the rs2248374 polymorphism leads to lack of protein expression (30), the effects on functional ERAP2 activity in the cell can be pronounced since individuals homozygous for the G allele express no full-length enzymatically active ERAP2, and heterozygous individuals express half the canonical protein amount. Although ERAP2 has different specificity than ERAP1 (49,50), since they are the only known ER-resident aminopeptidases, they cumulatively define the spectrum of aminopeptidase activity in the ER. From the genetic and enzymatic analysis, it appears that the highest ERAP2 expression allotype combination is found more frequently in individuals that carry an intermediate activity ERAP1 allotype (allotype 8), whereas individuals with reduced ERAP2 protein expression are more often homozygous for the most active ERAP1 allotype (allotype 2). This is in line with previous genetic analysis of co-occurrence of ERAP1-ERAP2 haplotypes (10). Thus, it appears that some balancing selection may exist that attempts to normalize total aminopeptidase activity in the ER.
Although our in vitro analysis has the advantage of allowing the accurate determination of fundamental molecular properties, how these properties translate to changes in antigen presentation by cells is not always straightforward. Antigen presentation is primarily controlled by peptide binding onto MHCI, and this process may mask some changes in ERAP1 activity. Indeed, recent studies have suggested that the potential of ERAP1 in influencing antigen presentation is limited by this phenomenon (51,52). Even so, many recent cell-based studies using proteomic approaches have provided strong support that changes in ERAP1 activity because of polymorphic variation are translated to changes in antigen presentation (53). Also, although the expression of ERAP1 mediated by splice interfering variants (18) is in linkage disequilibrium with variants that encode distinct allotypes, genetic association studies have shown that for some (HLAassociated) conditions the disease risk is primarily mapped to enzymatic activity (10). Thus, although MHCI binding is the dominant filter, changes in ERAP1 activity are highly relevant to antigen presentation and probably underlie part of known disease associations.
In summary, we provide a current allotype and genotype analysis for the ERAP1 gene in human populations and a detailed enzymatic characterization of the 10 most common ERAP1 allotypes. Our results suggest that individual SNPs synergize to shape allotype enzymatic properties by affecting both catalytic efficiency and substrate affinity. Our analysis defines a two-order of magnitude-wide landscape of ERAP1 genotype activities in human populations and suggests that ERAP1/2 genotypes operate in tandem with MHCI haplotypes to generate the necessary plurality in antigen presentation that supports the observed variability of immune responses between individuals.

Experimental procedures Materials
Leu-pNA and Leu-AMC were purchased from Sigma-Aldrich. Peptide with the sequence LLKHHAFSFK was purchased from Genecust. Peptide LLRIQRGPGRAFVTI was purchased from JPT Peptide Technologies. Peptide YTAF-TIPSI was purchased from BioPeptide Inc, and ovalbumin peptides were from CRB Discovery. All peptides were HPLC purified to >95% purity and confirmed by mass spectrometry to have the correct mass. Inhibitor DG013A was synthesized as described previously (38), and inhibitor GSK849 was obtained, purified, and characterized as described previously (16).

ERAP1 allotype estimation in the human population
Available phased genotype data for nine coding SNPs at 5q15 in 2504 samples of 26 ethnic groups of European, African, East Asian, South Asian, and mixed American ancestry were obtained from the 1000 Genomes Project phase 3 (25). Phased genotypes were used to estimate the allotypes on each chromosome based on 9-SNP haplotypes, which occur in >1% of all populations. Combinations of the estimated allotype frequencies were plotted using the R package ggplot2 (54).

Gene constructs, protein expression, and purifications
The sequences corresponding to full-length ERAP1 allotypes with a noncleavable C-terminal 6 His tag were codon optimized for insect expression, and the genes were chemically synthesized (GenScript Biotech). Each gene was then ligated with the BamHI and XhoI-linearized pFASTBAC1 vector using T4 ligase. The product of the ligation reaction was transformed into the competent cells, and the positive clones were selected by single-colony screening and DNA sequencing. The positive plasmid was further verified by digestion with BamHI and XhoI. Protein expression was performed as previously described, with the exception that the C-terminal His tag was not removed (32). Concentrations of protein stock solutions were determined spectrophotometrically using an extinction coefficient of 171,200 M −1 cm −1 at 280 nm. Protein purity and integrity was estimated by SDS-PAGE and size-exclusion chromatography (using a TSK G3000SW or an Agilent AdvanceBio SEC 300A column) and was over 95% (Figs. S4 and S5). All protein variants were found to be monomeric and monodispersed. Stability toward aggregation was tested by size-exclusion chromatography after repeat freeze-thaw cycles. All ERAP1 allotypes were validated for the presence of the appropriate combinations of SNPs by DNA sequencing (Fig. S6).

Enzymatic assays with dipeptide substrates
The enzymatic activity of ERAP1 was measured using the dipeptidic fluorogenic substrate Leu-AMC as previously Enzymatic properties of common ERAP1 allotypes described (55). Briefly, the change in fluorescence at 460 nm (excitation at 380 nm) was followed over time using a TECAN SPARK 10 m plate reader. A standard curve of aminocoumarin was used to convert the signal to product concentration. For MM measurements, the dipeptide substrate Leu-pNA was used, and the generation of pNA was followed by measuring the absorption at 405 nm as described previously (55). For experiments measuring the rate of hydrolysis versus enzyme or substrate concentrations, measurements using the dipeptidic fluorogenic substrate Leu-AMC were made in a buffer of 20 mM Hepes (pH 7.0), 100 mM NaCl, 0.002% Tween 20, in deep 384-well plates at a final volume of 50 μl/well at 20 C, using a Tecan M1000 plate reader with 360 nm excitation and 460 nm emission (5 nm bandpass on both monochromators). Measurements where enzyme concentration was varied were set up using a multichannel pipette and initiated by adding a final concentration of 25 μM Leu-AMC. Measurements where substrate concentration was varied were set up using a Hewlett Packard D300 digital dispenser and initiated by adding a final concentration of 25 nM ERAP1 (allotypes 1-9) or 250 nM ERAP1 (allotype 10). An aminocoumarin standard curve was used to convert fluorescence intensity to product concentration, after which initial rates were obtained by linear fit of the early region of the time courses.

Enzymatic assays with peptides
Trimming of peptides LLRIQRGPGRAFVTI and LLKHHAFSFK was performed as described previously (16). Briefly, 20 μM peptide and 1 nM enzyme at a final assay volume of 200 μl were mixed in assay buffer of 20 mM Hepes (pH 7.0), 100 mM NaCl, 0.002% Tween 20, and incubated for 30 min at 37 C. Reactions were carried out in three replicates for each allotype, stopped by freezing, and were stored at −80 C until analyzed by HPLC. For trimming assays using the YTAFTIPSI substrate, measurements were made in an assay buffer of 20 mM Hepes (pH 7.0), 100 mM NaCl, 0.002% Tween 20, in deep 384well plates at a final assay volume of 25 μl/well and temperature of 20 C. For the MM kinetics, a range of concentration of YTAFTIPSI peptide was dispensed using a Hewlett Packard D300 digital dispenser, and reactions were initiated by addition of a final concentration of 1 nM ERAP1 (allotypes 1-9) or 10 nM ERAP1 (allotype 10). Reactions were stopped after 60 min of incubation by addition of 25 μl of 0.75% TFA in water containing 5 μM Ac-YTAFTIPSI as internal standard. Mass signal intensities corresponding to the product (TAFTIPSI) and internal standard (Ac-YTAFTIPSI) were measured on a Rapidfire autosampler (Agilent) equipped with a C18 solid phase cartridge, coupled to a Sciex 4000 Q-trap MS (AB Sciex), using multiple reaction monitoring, as described previously (34). Integrated product intensity signals were normalized to the respective internal standard intensity signals, before conversion to product concentration using a TAFTIPSI standard curve. Turnover did not exceed 35% at any substrate concentration. Data were fitted to the MM equation as described previously (34). To measure inhibition by GSK849, a threefold dilution series of inhibitor was made in dimethyl sulfoxide, and 250 nl of each concentration point was dispensed to a 384-well assay plate using an Echo acoustic dispenser. ERAP1 was added followed by YTAFTIPSI substrate at a final concentration of 5 μM. Final ERAP1 concentrations varied according to allotypes as follows: 0.5 nM allotype 3; 1 nM allotypes 1, 2, 4, 5, 6, 8, and 9; 2 nM allotype 7; and 10 nM allotype 10. After incubation for 60 min, reactions were stopped and measured as described previously. Calibration curves were measured for substrate (YTAFTIPSI) and product (TAFTIPSI) and used to correct for the difference in detection sensitivity between the two analytes. The corrected intensity data were used to calculate percent turnover, from which product concentration was calculated. Data were normalized to percent enzymatic activity between high (uninhibited) and low (100 μM DG13) controls and fitted to a fourparameter logistic expression as described previously (34).
To follow the trimming of ovalbumin epitope precursors, measurements were made in an assay buffer of 50 mM Hepes (pH 7.0), 100 mM NaCl, 0.002% Tween 20, in deep 384-well plates at a final assay volume of 50 μl/well and temperature of 20 C. ERAP1 (allotypes 1-9 at 10 nM final concentration, allotype 10 at 100 nM final concentration) were mixed with 14 mer peptide GLEQLESIINFEKL (at 25 μM final concentration). Wells were stopped at increasing time points by the addition of 50 μl of 0.75% TFA in water containing 5 μM Ac-YTAFTIPSI as internal standard. Mass signal intensities corresponding to the seven ovalbumin peptide species (14-mer GLEQLESIINFEKL, 13-mer LEQLESIINFEKL, 12-mer EQLESIINFEKL, 11-mer QLESIINFEKL, 10-mer LESIIN-FEKL, 9-mer ESIINFEKL, and 8-mer SIINFEKL) and internal standard (Ac-YTAFTIPSI) were measured using the Rapidfire device as described previously. Signals at each peptide mass were divided by the respective internal standard signal. Normalized data were converted to peptide concentration using calibration curves for each peptide species.

Data availability
All data described are available in the article and associated supporting information. Numerical values used for generation of graphs are available upon request to the corresponding author (Efstratios Stratikos; E-mail: stratos@rrp.demokritos.gr or estratikos@chem.uoa.gr).
Supporting information-This article contains supporting information.