A Human Short Open Reading Frame (sORF)-encoded Polypeptide That Stimulates DNA End Joining*

Background: Large numbers of peptides encoded in human short open reading frames have been recently identified but not yet functionally characterized. Results: A peptide interacts with the Ku heterodimer and stimulates nonhomologous end-joining DNA repair. Conclusion: Newly discovered cellular peptides can be functionally characterized by identifying their interaction partners. Significance: Short ORF-encoded polypeptides participate in essential cellular processes. The recent discovery of numerous human short open reading frame (sORF)-encoded polypeptides (SEPs) has raised important questions about the functional roles of these molecules in cells. Here, we show that a 69-amino acid SEP, MRI-2, physically interacts with the Ku heterodimer to stimulate DNA double-strand break ligation via nonhomologous end joining. The characterization of MRI-2 suggests that this SEP may participate in DNA repair and underscores the potential of SEPs to serve important biological functions in mammalian cells.

Interest in the complete coding potential of the human genome has recently exploded due to technical advances permitting detection of translation of nonannotated sequences. In particular, widespread translation of short open reading frames (1) (sORFs) 3 producing peptides Ͻ150 amino acids has been reported in mammalian cells by ribosome profiling (2,3), proteomics (4), and peptidomics (5)(6)(7)(8). The apparent prevalence of sORF translation raises the question of whether sORF-encoded polypeptides (SEPs) are functional biomolecules.
Although sORF translation is prevalent, few mammalian SEPs have been demonstrated to have biological functions. SEPs regulating limb morphogenesis (9,10) and cardiac function (11) have been characterized in insects, and a few bioactive yeast (12) and plant (13)(14)(15) SEPs have been reported. In humans, the only well characterized SEP is the peptide humanin, which inhibits amyloid ␤-induced neuronal cell death (16,17). The biological relevance of SEP translation can only be addressed if general strategies for their functional annotation can be developed. Here, we report a functional proteomics strategy to identify the biological functions of SEPs, and use it to identify the SEP MRI-2 as a novel Ku interaction partner.

EXPERIMENTAL PROCEDURES
Cloning and Gene Synthesis-MRI-2, Ku70, and Ku80 coding sequences were obtained from Open Biosystems. The MRI-2 coding sequence was subcloned with a C-terminal FLAG epitope tag into pcDNA3, and with a C-terminal His 6 tag into pET21a, for mammalian and bacterial expression, respectively. The MRI-1 coding sequence with a C-terminal FLAG tag was synthesized by GenScript and subcloned into pcDNA3. The MRI-3-FLAG coding sequence was subcloned from this construct into pcDNA3. The Ku80 coding sequence was subcloned into pcDNA3 with a C-terminal HA tag. The Ku70 coding sequence was subcloned into pcDNA3 with a C-terminal c-Myc tag.
Co-immunoprecipitation and Proteomics-FLAG-tagged MRI-1-3 in pcDNA3 (or empty pcDNA3) was transfected into a 10-cm dish of HEK 293T cells using Lipofectamine 2000 and Opti-MEM (Life Technologies). 24 h after transfection, cells were harvested and lysed using Tris-buffered saline with 1% Triton X-100 (TBS-T) and Roche Applied Science Complete protease inhibitor cocktail tablets. Cells were lysed on ice for 20 min followed by centrifugation at 14,000 rpm, 4°C, 15 min. Anti-FLAG agarose (clone M2, Sigma) was washed with TBS-T, collected by centrifugation, and then suspended in the cell lysate. Bead suspensions were rotated at 4°C for 1 h and then washed three times with TBS-T. Bound protein was eluted with 3ϫ FLAG peptide (Sigma) at 4°C for 1 h. For proteomics, samples were analyzed by SDS-PAGE with Coomassie Brilliant Blue stain, and bands elevated in the IP samples were identified and excised. For quantitation via spectral counting, the gel was cut straight across with a clean razor to excise the same molecular weight band in both the sample and the negative control.
Protein-containing gel slices were digested with trypsin overnight. The resulting peptide mixtures were extracted from the gel and run on an Orbitrap Velos (Thermo Fisher Scientific) with 90-min liquid chromatography and tandem mass spectrometry (LC-MS/MS) using a standard TOP20 method as described previously (7).
Mass spectra were analyzed using our in-house Proteome Browser System against the UniProt_human database. Carbamidomethylated cysteines were set as a fixed modification, and methionine oxidation and N-terminal acetylation were set as variable modifications. A mass error of 20 ppm was set for precursor ions, and 0.6 Da was set for MS/MS peaks. Two missed cleavages were allowed. Maximum False Discovery Rates were set to 0.01 on both peptide and protein levels. Minimum required peptide length was five amino acids. Interaction candidates were identified by spectral counting as proteins elevated Ͼ1.7-fold in the IP relative to the negative control, with an abundance cutoff of 20 spectral counts in the immunoprecipitate.
Western Blotting-SDS-PAGE gels and transfers were performed as described previously (7). Immunoblots were blocked with Rockland Immunochemicals fluorescent blocking buffer and then probed with the following primary antibodies at a 1:1000 dilution in the same buffer: mouse anti-FLAG (clone M2, Sigma), mouse anti-XRCC6 (Ku70, clone 4C2-1A6, Sigma), mouse anti-XRCC5 (Ku80, clone 3D8, Sigma). Secondary antibodies, goat anti-mouse IR dye 800 and goat anti-rabbit IR dye 680 (LI-COR), were applied at a dilution of 1:4000 in Rockland Immunochemicals fluorescent blocking buffer. Infrared imaging was performed on a LI-COR Odyssey.
Immunofluorescence and Confocal Imaging-Cells were grown, transfected, fixed, and permeabilized as described previously (7). Etoposide (Sigma) was applied to cells at 50 M for 4 h. Cells were stained with primary antibodies: mouse anti-FLAG (clone M2, Sigma); rabbit anti-HA (Rockland Immunochemicals); and/or chicken anti-c-Myc (Aves Labs). Secondary antibodies (Life Technologies) were goat anti-mouse Alexa Fluor 568, goat anti-rabbit Alexa Fluor 488, and goat antichicken Alexa Fluor 647. All antibodies were applied at a 1:1000 dilution in fluorescence blocking buffer (Rockland Immunochemicals) for at least 1 h at 4°C followed by three phosphatebuffered saline (PBS) washes after fixation with formalin and nuclear counterstain with Hoechst 33342.
To determine the nuclear fraction of MRI-2, regions of interest were manually drawn on whole cells, as defined by MRI-2 staining and the DIC image, and on the nucleus, as defined by Hoechst 33342 staining. Background was defined for whole cell and nucleus using an untransfected cell. For each cell, the ratio of the background-corrected nuclear sum intensity to the background-corrected whole cell sum intensity was used to calcu-late the nuclear fraction. Single cell data were plotted as histograms, and population averages were calculated. Statistical analysis of significance was by two-tailed Student's t test, and error bars represent S.E.
MRI-2 Purification-His 6 -tagged MRI-2 in pET21a was transformed into BL21 Star DE3 cells. A saturated 10-ml overnight culture in Luria Bertani broth (LB) with 100 g/ml ampicillin was diluted into a 2-liter LB-ampicillin culture and grown at 37°C with shaking to optical density 0.6. Protein expression was induced with 1 mM isopropyl-1-thio-␤-D-galactopyranoside, and cells were grown at 30°C with shaking for 4 h. Cells were harvested by centrifugation at 8000 rpm for 10 min at 4°C and stored at Ϫ80°C until use. Protein was purified using a protocol previously reported for Ku (18). Pure MRI-2 (assayed by SDS-PAGE) was dialyzed into 50 mM Tris-HCl, pH 8.0, 20% glycerol, 50 mM NaCl, and 1 mM DTT, aliquoted, and flashfrozen. Protein concentration was determined with Bradford assay (Pierce) with reference to BSA.

RESULTS
We focused our study on a 69-amino acid SEP from the C7orf49 gene identified in our previous peptidomics profiling of K562 cells (Fig. 1A) (6,7). Because this SEP was annotated as a predicted protein in the NCBI nonredundant protein database, we did not include it in the list of reported SEPs (7); however, our evidence is the first experimental detection of this polypeptide in cells. Identifying a protein or polypeptide from a single MS/MS spectrum requires the highest spectral quality, and this spectrum passes our criteria (7), with a continuous y ion series of 11 fragments and an S f score (23) of 0.88 (Fig. 1B). This provides the first evidence for endogenous expression of this SEP in a human cell line.
This SEP represented a good candidate for functional characterization for two reasons. First, C7orf49 is conserved in mammals (24), suggesting that this gene has undergone functional selection. Second, the alternatively spliced C7orf49 gene is predicted to produce three polypeptide isoforms with distinct sequences, one of which has a known function (Fig. 1C). Isoform 1, a 157-amino acid polypeptide, was identified as a modulator of retrovirus infection in a phenotypic screen (24), although its molecular mechanism remains unclear. C7orf49 isoform 1 was therefore named modulator of retrovirus infection homolog 1 (MRI-1). The SEP we detect corresponds to MRI isoform 2 (MRI-2), and the third isoform is referred to as MRI-3, neither of which has been previously characterized. Taken together, we characterized the MRI-2 SEP because of the conservation and known cellular function of the MRI/C7orf49 gene.
SEPs are small and, therefore, most likely to function via interactions with other biomolecules than to have catalytic activity. Identifying SEP-protein interactions using co-immunoprecipitation (co-IP) proteomics represents a general strategy for SEP functional characterization. Co-IP of MRI-2 from HEK293T cells enriched two bands at 70 and 80 kDa ( Fig. 2A). Semiquantitative proteomics via spectral counting revealed these proteins to be Ku70 and Ku80, the two subunits of the heterodimeric DNA end-binding protein Ku (Fig. 2B). We validated these interactions by Western blotting using antibodies against Ku70 and Ku80 (Fig. 2C). We note that enrichment of heat shock protein 70 family members (e.g. HSPA8, HSPA1A, HSPA9), which function as chaperones, is not likely to be functionally relevant as these proteins are the most commonly detected nonspecific interaction partners detected in co-IP experiments (25).
MRI-1 and -3 may serve related functions, so we repeated the co-IP for the additional isoforms. Co-IP of MRI-1 and -3, followed by quantitative proteomics and Western blotting, revealed that MRI-1 also binds to Ku, as well as DNA-dependent protein kinase catalytic subunit (DNA-PKcs), whereas MRI-3 does not enrich any proteins (Fig. 2, D-H). MRI-1 and -2 likely interact with Ku through their shared N-terminal 46 amino acids, which are missing in MRI-3 (Fig. 1C).
Ku and DNA-PKcs are key participants in the NHEJ pathway of DNA double-strand break (DSB) repair. DSBs, caused by ionizing radiation, DNA-damaging drugs, and free radicals (26), are so toxic that a single unrepaired lesion can lead to cell death (27); alternatively, unrepaired DSBs can lead to chromosomal instability and/or translocations (27). NHEJ is the pre-dominant DSB repair pathway in mammalian cells (28,29). The first protein to bind the DSB is the Ku heterodimer, a ringshaped DNA end-binding protein (30) that acts as the DNAbinding subunit of the DNA-dependent protein kinase (DNA-PK) (31). Binding of Ku and subsequent assembly of DNA-PK are followed by recruitment of additional repair factors that resect and ligate the DSB. The interaction of MRI-2 with Ku suggests that MRI-2 may participate in NHEJ.
MRI-2 expressed in HeLa and HEK293T cells was observed in both nuclear and cytoplasmic compartments (Fig. 3, A and C, top). The nuclear fraction of MRI-2 in HeLa ranged from 19 to 45% of the total protein, with a population average of 29 Ϯ 7% (S.E.) (Fig. 3A, top, and 3C). Similarly, in HEK293T cells, nuclear MRI-2 ranged from 14 to 80%, with a population average of 33 Ϯ 3% (Fig. 3B, top, and 3D).
Although a significant fraction of MRI-2 is nuclear, MRI-2 does not contain an obvious nuclear localization sequence. This SEP is small enough to passively diffuse through the nuclear pore, which is permeable to proteins Ͻ20 -40 kDa (32). We hypothesize that the mechanism of MRI-2 nuclear localization depends upon its passive diffusion into nuclei followed by association with Ku, which is almost exclusively nuclear (33). For factors that depend upon protein-protein interactions for localization, overexpression can saturate interaction partners, leading to aberrant cellular distribution (34 -36). This suggests that the cell-to-cell variation in MRI-2 nuclear localization might be caused by different levels of MRI-2 expression.
We therefore examined whether increased Ku expression, through co-expression of Ku70 and Ku80, increased nuclear localization of MRI-2. Co-expression of Ku resulted in enhanced nuclear localization of MRI-2, which ranged from 23 to 85% (average 38 Ϯ 2%) in HeLa and from 27 to 85% (average FIGURE 1. Detection of C7ORF49 isoform 2 (MRI-2) in K562 cells. A, peptidomics workflow to identify nonannotated short ORFs. The K562 cellular peptidome was isolated, fractionated, and subjected to liquid chromatography-mass spectrometry. Nonannotated peptides were identified by SEQUEST search against a K562 RNA deep sequencing library and subsequent removal of annotated sequences. B, MS/MS spectrum of the unique C7ORF49/MRI-2 C-terminal tryptic peptide, with detected fragment ions marked in blue (b ions) and red (y ions). C, multiple sequence alignment (ClustalW2) of the three MRI isoforms reveals that isoforms 1 and 2 have identical N-terminal sequences of 46 amino acids (red) before a frameshift generates a unique C-terminal sequence for isoform 2 (blue for isoform 2, green for isoform 1). Isoform 3 is identical to isoform 1, but lacks the N-terminal 46 amino acids.
60 Ϯ 3%) in HEK293T (Fig. 3, A and B, bottom, and C and D). This represents a Ku-mediated increase in MRI-2 nuclear localization of 33% in HeLa (p ϭ 0.005) and 82% in HEK23T (p ϭ 8 ϫ 10 Ϫ8 ). Inspection of nuclei of co-transfected cells revealed co-localization of MRI-2 and Ku80 (Fig. 3E). We then tested the effect of Ku co-expression on the subcellular distribution of MRI-1 and -3. Nuclear localization of MRI-1 was enriched by Ku co-expression, whereas MRI-3 was unaffected (Fig. 3F). These data are consistent with our hypothesis that nuclear enrichment of MRI-1 and -2 is dependent upon direct interaction with Ku.
We then examined the nuclear localization of MRI-2 after induction of DSBs using the topoisomerase inhibitor etoposide, which creates DSBs and induces NHEJ. Etoposide treatment increased the nuclear localization of MRI-2 relative to vehicletreated cells (Fig. 4A), evident in both single cell histograms of the percentage of MRI-2 in the nucleus (Fig. 4B) and the population average of MRI-2 nuclear localization, which increases by ϳ31% (p ϭ 0.007, Fig. 4C). Although the magnitude of this effect is small, nuclear recruitment of MRI-2 by etoposide treatment is consistent with a role for this protein in NHEJ in cells.
Using an electrophoretic mobility shift assay, we confirmed that MRI-2 interacts with Ku bound to DNA in vitro. Radiolabeled double-stranded DNA is incubated with purified Ku70 and Ku80, which results in formation of Ku-DNA or Ku2-DNA complexes that can be resolved by gel electrophoresis. The addition of purified MRI-2 results in the decreased mobility of the complex, confirming tripartite complex formation (Fig.  4D). Although we do not have independent confirmation that the lower mobility species contain MRI-2, this conclusion is consistent with the co-IP data. Furthermore, the progressive mobility shift suggests that multiple molecules of MRI-2 bind to each Ku-DNA complex.
Finally, we measured the effect of purified MRI-2 on NHEJ efficiency in vitro using a double-stranded DNA ligation assay in cell extracts (18,22,37). Purified recombinant MRI-2 was added to NHEJ-competent cell extracts, and the conversion of a radiolabeled DNA duplex to higher molecular weight oligomeric ligation products was monitored. Comparison of the basal NHEJ reaction to a reaction containing added MRI-2 (5 M) demonstrated an increase in product formation of 68% (p ϭ 0.02) (Fig. 4E). Controls employing a DNA-PK inhibitor, wortmannin, as well as an antibody against the required NHEJ protein, XRCC4, eliminate end joining in these extracts, con-  firming that the ligation activity observed is due to bona fide NHEJ. This demonstrates that MRI-2 stimulates DSB ligation via NHEJ.

DISCUSSION
In this study, we have identified MRI-2 as an interaction partner of Ku. MRI-2 is recruited to the nucleus by Ku overexpression and by induction of DSBs, and enhances the rate of NHEJ in vitro. MRI-2 is therefore a novel NHEJ factor (Fig. 4F), which may stimulate NHEJ in a number of ways, such as enhancing DSB binding by Ku, promoting assembly of the repair complex, and/or releasing Ku from DNA to promote multiple turnovers. Future mechanistic studies will reveal the molecular role of this SEP in DSB repair.
The MRI gene produces three polypeptides, MRI-1-3, and although we focused on characterization of the interaction partners and functions of MRI-2, we also investigated the interactions of MRI-1 and -3. MRI-1 interacts with Ku as well as DNA-PK, suggesting that MRI-1 and -2 may both function in NHEJ. MRI-3 does not bind to Ku, suggesting that this isoform may be inactive. These varying interaction specificities identify a Ku binding sequence in MRI-1 and -2, and also suggest that alternative splicing of MRI transcripts could differentially affect NHEJ efficiency inside cells.
MRI-1 was originally identified as a factor that enabled retroviral infection of mammalian cells (24), based on genetic complementation of retroviral infection-resistant cells by overexpression of an MRI-1 cDNA clone. This study also reported that RNAi-mediated silencing of MRI decreases retroviral infection sensitivity. Because treatment of the cells with a proteasome inhibitor phenocopied MRI-1 overexpression, the authors concluded that MRI-1 acts on the proteasome. Our work, however, suggests a different mechanism for the involvement of the MRI gene in retroviral infection. It is well established that NHEJ in the host cell is utilized during retroviral infection, promoting stable viral genome integration by resolving linear DNA fragments of this process (38 -41). Therefore, MRI-1 may enhance retroviral infection via its interaction with Ku, and MRI-2 may also be involved in this process. Future work to determine the role of NHEJ during the rescue of retroviral expression by MRI-1 and -2 will test this hypothesis.
In conclusion, hundreds of newly discovered SEPs have now been reported in human cells (1, 6 -8), but few have been functionally characterized. The identification of the MRI-2-Ku interaction suggests that functional proteomics is a general method to identify the cellular and biochemical roles of newly discovered SEPs. The involvement of MRI-2 in NHEJ, a central process required for cell survival, suggests that many more SEPs may have important cellular functions and demonstrates the continued need to discover and functionally characterize all SEPs produced from the human genome.