Large-scale analysis of variation in the insulin-like growth factor family in humans reveals rare disease links and common polymorphisms

The insulin-like growth factors IGF1 and IGF2 are closely related proteins that are essential for normal growth and development in humans and other species and play critical roles in many physiological and pathophysiological processes. IGF actions are mediated by transmembrane receptors and modulated by IGF-binding proteins. The importance of IGF actions in human physiology is strengthened by the rarity of inactivating mutations in their genes and by the devastating impact caused by such mutations on normal development and somatic growth. Large-scale genome sequencing has the potential to provide new insights into human variation and disease susceptibility. Toward this end, the availability of DNA sequence data from 60,706 people through the Exome Aggregation Consortium has prompted the analyses presented here. Results reveal a broad range of potential missense and other alterations in the coding regions of every IGF family gene, but the vast majority of predicted changes were uncommon. The total number of different alleles detected per gene in the population varied over an ∼15-fold range, from 57 for IGF1 to 872 for IGF2R, although when corrected for protein length the rate ranged from 0.22 to 0.59 changes/codon among the 11 genes evaluated. Previously characterized disease-causing mutations in IGF2, IGF1R, IGF2R, or IGFALS all were found in the general population but with allele frequencies of <1:30,000. A few new highly prevalent amino acid polymorphisms were also identified. Collectively, these data provide a wealth of opportunities to understand the intricacies of IGF signaling and action in both physiological and pathological contexts.

The insulin-like growth factors, IGF1 2 and IGF2, are closely related, single-chain, secreted peptides that are essential for normal growth and development in mammals and other vertebrates, play critical roles in many physiological and pathophysiological processes in multiple species (1)(2)(3)(4), and promote proliferation, differentiation, and/or survival of a variety of cell and tissue types (5,6). Along with insulin (INS), the two IGFs define a conserved gene family that is found in many eukaryotes (7)(8)(9). Both IGFs exert their biological effects through binding to the IGF1 receptor (IGF1R), a transmembrane ligand-activated tyrosine-protein kinase related to the insulin receptor (INSR) in amino acid sequence and structure (6,10,11). IGF2 also can signal after binding to a variant INSR lacking 12 amino acids in its juxtamembrane region (12) and binds with high affinity to the IGF2R, which primarily acts to limit access of IGF2 to signaling receptors through its sequestration, internalization, and degradation (13)(14)(15). Along with the cationdependent mannose 6-phosphate receptor (CDMPR), the IGF2R is additionally one of two receptors responsible for targeting lysosomal enzymes to the lysosome (13,16).
In extracellular fluid and in the bloodstream, IGFs are normally bound to IGF-binding proteins (IGFBPs), which modulate IGF actions by regulating their half-life and access to cell surface signaling receptors (17)(18)(19). The six IGFBPs are secreted molecules of 201-289 amino acids in length that share ϳ36% sequence identity (18,19). Each IGFBP is composed of cysteine-rich conserved NH 2 -and COOH-terminal domains along with a less conserved central linker segment (18,20,21). Many studies have indicated that IGFBPs additionally control biological processes that are potentially independent of their IGF binding properties (18,(22)(23)(24)(25).
The other key player in IGF biology is the IGF acid labile subunit (IGFALS), a 578-amino acid protein primarily produced by the liver that forms a ternary complex in the circulation with IGFBP3 or IGFBP5 when the latter binds either IGF1 or IGF2 (26,27). IGFALS is unrelated in amino acid sequence or structure to other IGF family components and consists of a series of leucine-rich segments of ϳ22 residues each flanked by NH 2 -and COOH-terminal cysteine-rich domains (28). The ternary complex greatly extends IGF half-life in the circulation (26,27).
Human diseases caused by mutations or changes in levels of IGF family members are uncommon (29,30). Only a handful of individuals with homozygous or heterozygous IGF1 gene defects have been reported (31)(32)(33)(34)(35)(36)(37), with homozygous individuals having very low IGF1 levels, severe intrauterine and post-natal growth deficiency, and other developmental and intellectual abnormalities (31)(32)(33), reflecting the multifunctional nature of IGF1 actions (35). A single heterozygous nonsense mutation has been described in the human IGF2 gene in four members of a family with severe pre-and post-natal growth retardation (38). Developmental disorders with altered expression of IGF2 include Silver-Russell syndrome, which is caused by abnormalities on chromosome 11p15 in the region of the parentally imprinted IGF2 locus (39,40) (the IGF2 gene is normally expressed solely from the paternally derived chromosome (39,40)). Individuals with Silver-Russell syndrome produce diminished amounts of IGF2, exhibit reduced fetal and post-natal somatic growth, and have a variety of dysmorphic features and bodily asymmetry (39,40). Conversely, overexpression of IGF2 accompanies Beckwith-Wiedemann syndrome, which is characterized primarily by asymmetric overgrowth (39,40). A handful of mostly heterozygous amino acid substitution mutations also have been identified in the IGF1R in children who were small for gestational age and who failed to exhibit post-natal "catch-up" growth (41)(42)(43)(44)(45). Frameshift and amino acid substitution mutations additionally have been detected in the IGFALS gene in a few children with moderate short stature but no other consistent physiological defects except a lack of IGFALS in the blood (46,47). To date no disease-associated abnormalities have been described for any of the six human IGFBPs (Ref. 30; see OMIM). Amino acid substitutions have been identified in the IGF2R in a small number of presumably healthy adults (48), and others have been found in several individuals with liver cancer (49).
Population-based genome sequencing has the potential to provide new insights into human variation, disease susceptibility, and evolution (50 -52). The recent release of DNA sequence data from Ͼ60,500 people via the efforts of the Exome Aggregation Consortium (ExAC) (53)(54)(55)(56)(57) has prompted dissection of this information to gain insights into the population genetics of the IGF system. Results reveal a broad range of potential missense and other alterations in the coding regions of every IGF family gene, with the vast majority of predicted changes being uncommon. Taken together, these data will provide new opportunities to understand the intricacies of IGF signaling and action in physiological and pathological contexts.

Allelic variation in IGF family members in humans
ExAC contains DNA sequencing data from the exomes of 60,706 people derived from different population groups from around the world (53). One general conclusion from initial analysis of these 121,412 alleles is that there is substantial variation within the coding regions of genes (53). However, most predicted modifications were found to be uncommon, with more than half being detected in just a single allele and with Ͼ99% being observed in Ͻ1% of the entire study population (53). The vast majority of the variation found consisted of synonymous changes and amino acid substitutions (53).
Examination of IGF family members in ExAC revealed a potentially wide range of coding variation in their exons, with most of the changes consisting of missense mutations (83-99%

Population genetics of the human IGF family
of modified alleles, depending on the gene; Tables 1-3). Second most common were alterations in the reading frame, including inserted stop codons (1% to ϳ10%, Tables 1-3). The total number of different allelic variants per gene varied over an ϳ15-fold range, from 57 for IGF1 to 872 for IGF2R, with genes encoding smaller proteins generally having fewer changes by a factor of 4 -16 than those with larger proteins (compare IGF1, IGF2, and IGFBP1-6 with IGF1R, IGF2R, and IGFALS; Tables 1-3). However, when corrected for protein length, the overall range of variation was very similar for 9 of 11 genes (0.28 -0.39 nonsynonymous changes/codon), with the outliers being IGFBP4 (0.22) and IGFALS (0.58). Moreover, when examined for their prevalence, Ͼ97% of the missense alleles were detected in Յ0.1% of the study population, and 99.4% were found in Յ1.0% (Table 4). These results indicate that variation in human IGF family proteins is low in the population and are consistent with the overall conclusions from ExAC as noted above (53). Splicing changes at exon-intron and intron-exon junctions, reading frame alterations, and the addition of stop codons each can contribute to loss of protein expression and thus a lack of function. The number of alleles showing these changes was very low among IGF family genes and ranged from 1 to 24 different instances, with most alterations being detected rarely in the population (0.002-0.2% allelic frequency), although IGFBP3, at 0.9%, and IGF2, at 2.0%, were exceptions (Table 5). Similarly, copy number variation, in which all or part of a gene is amplified in the genome, also was very low for IGF family members in the study population, as seen with most genes in ExAC (54), ranging from none for IGFALS to fewer than 7 instances for IGF1, IGF2, IGF1R and IGFBP1-6 to 26 for IGF2R.

Population variation in IGF1 and IGF2
IGF1, IGF2, and INS is composed of a family of evolutionarily related secreted proteins (7-9) that bind to similar receptors (6, 10, 12) but are different in the overall topography of their protein precursors. Unlike INS, the progenitors of IGF1 and IGF2 contain COOH-terminal extensions or E domains, which are cleaved from IGF1 or IGF2 by proprotein convertases subsequent to protein secretion (58,59). It is unknown whether or not there are distinct biological functions of the single E region in the IGF2 precursor or the two E domains of IGF1 (Refs. 60 and 61; see Fig. 1).
Illustrated in Table 1 are comparisons of the extent of variation found within the coding segments of their three genes in the 60,706 individuals assessed in ExAC. The overall number of alterations that could change amino acids or modify production of the protein was similar among IGF1, IGF2, and INS, although the total representation of these variant alleles in the population was far lower for INS (Ͻ0.01%) than for IGF1 or IGF2 (0.6% and 2.5%, respectively). Amino acid substitutions in the IGF1 precursor were fairly evenly distributed among the signal peptides, 70-residue mature IGF1 and the COOH-terminal E peptides ( Fig. 1A), although two alterations in mature IGF1, A67T and A70T, and one in the E B domain, A187D, accounted for most of the variation (Fig. 1A). Of note, there were no individuals identified with either a R36Q or V44M substitution or with a deletion within IGF1 exon 3 corresponding to mutations mapped in three children with severe short stature (31,33,35). Thus, if these alleles are present in the population, they have an ultra-low frequency.
Although population variation was 4-fold higher for the IGF2 gene (2.5%) than for IGF1 (0.6%, Table 1), most of this could be attributed to a single non-coding C to T change located 5Ј to the adjacent INS gene (see single nucleotide polymorphism (SNP): rs14948363). Unlike what was observed for IGF1, most of the coding variation in IGF2 was concentrated in the COOH-terminal E peptide, with a single amino acid substitution, R157H, accounting for Ͼ75% of all changes (Fig. 1B). The functional consequences of this non-conservative change are unknown, because, as noted above, the specific roles of the IGF2 E segment have not been elucidated (60,61).

Variability in IGF receptors in the population
The IGF1R and INSR are related ligand-activated tyrosine kinases that share a similar length and three-dimensional structure (6,12). Both receptors bind their eponymous ligands with high affinity and also bind IGF2 with moderately high affinity, although a splicing variant for the INSR lacking the 12 codons of exon 11 (termed INSR-A (12)) binds IGF2 with far higher affinity than INSR-B (12). Population variation appears to occur with comparable frequency for the two receptors (3.3% for IGF1R and 2.1% for INSR; Table 2). Thirteen of 18 of the mutations in the IGF1R gene that have been associated with human growth deficiency syndromes are amino acid substitutions (41,42). Alterations at 5 of these 13 sites are present in the ExAC database (Table 6), with one change, R511Q, detected in ϳ0.15% of the population ( Fig. 2A). Five other predicted amino acid substitution variants in the IGF1R gene have allelic frequencies of 0.1-0.7% ( Fig. 2A). All six substitutions map to the extracellular part of the receptor protein ( Fig. 2A) and collectively compose nearly 60% of the total population variation identified in the IGF1R in ExAC.  IGF1R  382  25  6  0  0  IGF2R  805  48  22  1  2  IGFBP1  72  6  0  0  1  IGFBP2  80  4  1  3  0  IGFBP3  63  2  0  0  0  IGFBP4  52  3  0  0  0  IGFBP5  78  7  0  1  0  IGFBP6  62  7  1  0  0  IGFALS  322  26  6 3 0 The IGF2R shares structural elements and functions with the CDMPR (13,16). Both receptors contain units of ϳ145 amino acids (15 in IGF2R and 1 in CDMPR), and both proteins are responsible for delivery of lysosomal enzymes from their intracellular sites of synthesis to the lysosome (13,16), although only the IGF2R appears capable of internalizing these enzymes from the extracellular space (16). IGF2R also binds IGF2 with high affinity, a function that is mediated by repeat number 11 (13,16).
Both the IGF2R and CDMPR genes appear to be highly polymorphic in the human genome, although the number of variants per codon is only marginally higher than for IGF1R (Table  2). However, in the IGF2R, one predicted amino acid substitution variant, R1619G, located in repeat 11, which contains the IGF2 binding unit, is far more prevalent in the ExAC study population than the amino acid found in the reference IGF2R gene in the Ensembl Genome Browser (ϳ90% of alleles; Fig.   2B), expanding on a prior observation using a much smaller study population (48). The functional differences between these two presumptive polymorphisms are unknown but now may be tested. Two additional substitutions, L252V, in repeat number 2, found in Ͼ13% of the population, and N2020S, in repeat 14, detected in nearly 10% (Fig. 2B), also were noted previously (48). Two other amino acid substitutions in IGF2R found in some individuals with liver cancer, G1449V and G1464E (49), are not present in the ExAC population. There is also variability within IGF2R repeats 3 and 9, the domains responsible for binding lysosomal enzymes, with changes detected in 0.3% and 0.7% of the population, respectively. A similar degree of polymorphism (0.3%) is found in the single repeat element in the CDMPR.

Population aspects of IGFBPs
The IGFBP family in mammals consists of six proteins that arose during speciation by duplication and diversification of a progenitor gene (18,20,21). All six IGFBPs are secreted proteins that range in length from 201 to 289 amino acids and share a similar 3-domain structure (20,21) (Figs. 3 and 4). The extent of variability in the exons of the genes encoding IGFBPs is generally similar to that of the two IGFs and two IGFRs, ranging from 0.27 to 0.34 changes per codon, with the exception of IGFBP4 at 0.22 (Table 3). However, total allelic variation ranged from very low in the population (0.2% for IGFBP4, 0.6% for IGFBP6) to moderate (2.8%, 3.8%, and 5.0% for IGFBP5, IGFBP2, and IGFBP3, respectively) to high (IGFBP1 at 36.0%, Table 3). Of note, most of the predicted changes consist of one or a few amino acid substitutions per IGFBP. For IGFBP1, the bulk of the variability is the result of a single change of I253M in the COOH-terminal domain (35.7%, Fig. 3A). For IGFBP2 it is R41Q in the NH 2 -terminal segment (2.7%, Fig. 3B), and for IGFBP3 it is A32G in the NH 2 -terminal domain, H164P in the linker, and T284I in the COOH-terminal region (2.0%, 1.4%,

IGFALS is an outlier
IGFALS, a 578-amino acid secreted liver protein, shares no structural or sequence similarity with IGFBPs, IGF receptors, or IGFs (27,46). Its physiological role, in forming ternary complexes in the circulation with IGFBP3 or IGFBP5 and IGF1 or IGF2, is to prolong IGF half-life in the blood (27). From the perspective of the population genetics of IGF family members, IGFALS is an outlier, as its overall rate of variation was much higher than each of the other genes analyzed (0.58 non-syn-  onymous changes/codon; Table 3). However, as seen with IGFBPs, 2 amino acid substitutions are responsible for half of the allelic variability in the population for IGFALS: L135F (0.4%) and R586W (2.1%; Fig. 5).
There are a small number of individuals identified with deficiency of IGFALS who have been ascertained because of moderate growth failure (46,47), and of these, 10 are single amino acid substitutions, 2 are small in-frame duplications, 4 are frame shifts, and 1 is a stop codon (Table 7). Alterations at 5 of 10 substitution sites, at 1 of 2 internal duplications, at 3 frameshift locations (although 1 is composed of an amino acid substitution), and at the single stop codon are present in the ExAC database at the ultra low frequency of 1-4 alleles in the population, whereas the others are absent (Table 7).

Discussion
Data from population-based genome sequencing of Ͼ60,500 people from ExAC (53) have been analyzed here to gain insights into the population genetics of the IGF system. Results identify a large number of possible missense alterations and other modifications in the coding regions of all 11 IGF family genes studied. The allelic frequency of the vast majority of these changes was very low, with most found in Ͻ0.1% of the population (Tables 4 and 5). However, 3 predicted amino acid substitutions were highly prevalent, as they were identified in 13.3% and 89.6% of IGF2R alleles (Fig. 2B) and in 34.8% of IGFBP-1 alleles (Fig. 3A). These modifications along with others found in the coding regions of IGFBP2 (Fig. 3B), IGFBP3 (Fig. 3C), IGFBP5 (Fig. 4B), and IGFALS (Fig. 5) in several percent of alleles, suggest that fairly common protein sequence variants are present in the population and that they thus have the potential to alter physiologically important aspects of IGF actions.

Limited population variability in IGF1 and IGF2
Coding variation in IGF1 and IGF2 genes is uncommon but is 60 -250ϫ more prevalent in the population that was detected in INS (Table 1), possibly reflecting both the essential role of INS in normal metabolic regulation and the important but less critical physiological functions of IGF1 and IGF2 in development and growth. Although population-based polymorphisms are 4ϫ more frequent in IGF2 than in IGF1 genes ( Table 1), 80% of the difference (ϳ2% of all alleles) is caused by a single noncoding G to C alteration in DNA that maps near the adjacent INS gene.
Three presumptive amino acid substitutions in the IGF1 precursor (Fig. 1A) are responsible for more than half of the variation detected in the ExAC population ( Table 1). Two of these substitutions, found within mature the 70-residue IGF1, are changes of alanine to threonine at residues 68 and 70 in the D domain of the molecule (Fig. 1A) (7,8). Although it is possible that one or both of these modifications might alter the stability or another aspect of IGF1 protein synthesis or half-life, the conservative nature and the locations of these modifications suggests that they are unlikely to interfere significantly with the ability of IGF1 to bind to the IGF1R or to IGFBPs. This segment of the protein, composing the last four residues of the eightamino acid D domain, does not interact with other proteins or even with other parts of IGF1 and was not seen in three-dimensional structural determinations of IGF1 by protein crystallography (62). Substitutions and in-frame deletions occur throughout the IGF1 precursor protein but with allelic frequencies that are so low (from 1:3000 to 1:120,000) that they are unlikely to have a significant population impact on human physiology.
IGF2 presents a picture broadly similar to IGF1 in terms of its population genetics. Most variants are uncommon, except for the non-coding change noted above, although in aggregate the IGF2 gene has the highest frequency of potential loss of function alleles of all IGF family members ( Table 5). The most prevalent amino acid substitution in the IGF2 progenitor is located in the COOH-terminal part of the E-domain (Fig. 1B), a section of the molecule with limited known biological function (8,61). A single truncation mutation in the IGF2 gene has been described in members of a family with severe growth deficiency and physical characteristics resembling Silver-Russell syndrome (38). Equally rare is an amino acid substitution at this same codon, which has been found on a single chromosome in the ExAC study population. The modification, A64P, is nonconservative, and it seems likely that it could significantly perturb IGF2 protein structure.

Rare and common polymorphisms in IGF receptors
Potential variations in amino acid sequence have been detected in nearly a third of the codons for both IGF receptors in the ExAC database, a frequency slightly higher than observed for INSR but similar to the CDMPR (Table 2). For the IGF1R, predicted alterations were found to be 3 times higher in the extracellular ␣ chain than in the transmembrane and intracellular ␤ chain, and 5 substitutions in the ␣ chain accounted for ϳ50% of all modified alleles in the population (Fig. 2A). One of these alterations, R437H, corresponds to a heterozygous change found in the IGF1R in a centenarian (63). In limited studies, lymphocytes from this individual were shown to exhibit reduced IGF1-mediated receptor signaling compared with control cells from another centenarian, and it was postulated that diminished IGF1R function caused by this allele might relate to extended lifespan (63). Another missense allele identified by the same investigators, A67T, was only found on 2 of 121,412 chromosomes in the ExAC database and is thus unlikely to have a significant physiological impact in the population.
The kinase domain extends over Ͼ60% of the ␤ chain of the IGF1R ( Fig. 2A). Substitutions or other modifications are predicted in ExAC for ϳ40% of these ϳ400 amino acids but are generally uncommon, having a collective population allelic frequency of ϳ0.35% (Fig. 2A). Similarly, with one possible exception (R511Q, Table 6), mutations of the IGF1R at residues associated with growth deficiencies are rare, and most have not been detected in the ExAC study population ( Table 6).
The IGF2R is a multifunctional protein involved in the clearance of IGF2 from the extracellular space and in the targeting of mannose 6-phosphate-containing lysosomal enzymes to the lysosome (16). Several predicted amino acid substitutions in the IGF2R appear frequently in the ExAC database, with one, R1619G, found in repeating unit 11, the IGF2-binding region (13,16), detected in nearly 90% of alleles, indicating that it is ϳ9ϫ more prevalent in the population than the genomic reference residue (Fig. 2B). The comparative effect of either amino acid on the rate or extent of removal of IGF2 from the extracellular space and thus on IGF2 actions is now worth evaluating.
Predicted amino acid changes are also fairly common in the two domains of IGF2R that are responsible for binding lysosomal enzymes, 0.3% for repeat 3 and 0.7% for repeat 9, as are alterations in the single repeat in the CDMPR (0.3%). As none of this information had been available until now, an opportunity exists to determine whether changes in the sequence and/or structure of either of these proteins might alter lysosomal enzyme targeting and possibly lysosomal activity and function.

Wide range of variation in IGFBPs and IGFALS
The frequency of allelic modifications among the six IGFBP genes in the ExAC study population spans a 180-fold range, from 0.2% for IGFBP4 to 36% for IGFBP1 (Table 3). Only a few amino acid substitutions per IGFBP are responsible for most of this population variability (Figs. 3 and 4). At present, the biological significance of these modifications is unknown, as there are no specific diseases connected to IGFBP gene alterations in humans (Ref. 30; see OMIM). However, because several biochemical parameters of each IGFBP, including protein half-life

Population genetics of the human IGF family
and binding affinity for IGFs, can potentially influence IGF concentrations in the blood and extracellular fluid and thus modify IGF actions, one or more of these changes in individual IGFBPs could affect specific medical problems, including cancer susceptibility or others (22)(23)(24)(25). IGFALS is dissimilar to any other IGF family member. For example, its structure was shown recently to resemble Toll-like family receptors (28). Its primary function is as a carrier protein in blood for complexes of IGFBP3 and IGFBP5 and either IGF1 or IGF2 (26,27). The total percentage of alternative alleles for the IGFALS gene in the population is similar to that of several IGFBPs, although the rate of variation is significantly higher (0.58 changes/codon; Table 3). One substitution, R586W, accounting for 40% of all variants, maps to the second cysteinerich region of IGFALS (Fig. 5) and resides near the COOH terminus of the recently proposed horseshoe-shaped structural model for the protein (28). Mutations in IGFALS are a rare cause of post-natal growth failure (46). Several of these aberrant alleles are present at low frequency in the ExAC population, but half of the amino acid substitutions are absent ( Table 7). Most of these changes have been predicted to severely alter IGFALS structure, thus providing a rational basis to understand diseaselinked dysfunction (28).

Limitations and implications of population-based genome sequence data on understanding IGF actions in humans
As with any large-scale DNA sequencing endeavor, the ExAC database contains the raw material for novel biological insights as well as both errors and ambiguities. From the perspective of IGF family components, potential problems include the choice of some minor transcripts as the reference mRNA, although this problem can be resolved using databases such as Ensembl. Among the 11 IGF family genes, only IGF2 and IGFALS required explication, as in both cases the assigned mRNA encoded a signal peptide that was much longer than what was used by the major transcripts. More significantly, some of the proposed variants cannot be mapped to the respective gene or protein, as noted for IGFALS (Table 3), in which several identified highly polymorphic amino acid substitutions do not exist in the molecule. Other possible limitations of the data include the fact that even though many population groups are represented, Ͼ60% of study subjects are of European ancestry, and only ϳ8% are either from Africa or Latin America (53). Thus, the true rate of protein variation among humans may not yet be realized. In addition, as a minor point, there is an unknown error rate associated with nucleotide changes that appears only once in 121,412 alleles evaluated.
Despite these limitations, analysis of these data points to new opportunities to reevaluate normal human IGF physiology and pathology. From a physiological perspective, IGF system components play major roles in the complex interactions that define normal somatic growth in children, including the relationships between genetic and environmental factors (64). The strength, duration, and durability of these interactions and their range of outcomes now may be re-examined in light of the many versions of IGF1, IGF2, IGFBPs and receptors, and IGFALS present in the population. As IGF signaling pathways have also been postulated to be involved in aging (4, 65) and in disease pathogenesis in adults (3,11,12,22), it seems likely that some variants or combinations may enhance susceptibility and others may be neutral or protective.
The extensive variability captured by ExAC should be traceable to our various recent and more distant ancestors, including extinct populations such as Neanderthals and Denisovans (66,67). The genomes of modern humans contain traces of past contacts with these populations, and they have left us with DNA marks that clearly influence certain traits, such as hair color and skin pigmentation, and possibly certain disease predispositions (67). Although unlikely to occur, an ExAC-sized DNA database for Neanderthals, Denisovans, and other more distant human ancestors could lead to remarkable insights about human origins and how specific genetic influences shape human variation and biology. It is clear that new hypotheses inspired by these data can provide novel insights into the complex biology of IGF actions and those of other proteins in health and disease.

Experimental procedures
Data on variation in human IGF1, IGF2, IGF1, and IGF2 receptors, IGFBPs 1-6, IGFALS, INS, INSR, and the CDMPR were derived from information in the ExAc genome browser, consisting of compiled results from exome sequencing of 60,706 individuals (68). Human transcripts and genes were accessed from the Ensembl Genome Browser using genome assembly, GRCh38. Sources of human protein sequences and domains were the National Center for Biotechnology Information Consensus CDS Protein Set (www.ncbi.nlm.nih.gov) and the UniProt browser. Other databases consulted included Online Mendelian Inheritance in Man (OMIM) and the Growth Genetics Consortium.
Author contributions-P. R. conceived of the study, collected and interpreted the results, and wrote the manuscript.