Repeat-associated non-ATG (RAN) translation

Microsatellite expansions cause more than 40 neurological disorders, including Huntington's disease, myotonic dystrophy, and C9ORF72 amyotrophic lateral sclerosis/frontotemporal dementia (ALS/FTD). These repeat expansion mutations can produce repeat-associated non-ATG (RAN) proteins in all three reading frames, which accumulate in disease-relevant tissues. There has been considerable interest in RAN protein products and their downstream consequences, particularly for the dipeptide proteins found in C9ORF72 ALS/FTD. Understanding how RAN translation occurs, what cellular factors contribute to RAN protein accumulation, and how these proteins contribute to disease should lead to a better understanding of the basic mechanisms of gene expression and human disease.


Introduction and background
More than 40 different neurological diseases are caused by unstable microsatellite sequences (e.g. CAG, CCG, or G 4 C 2 ) that are repeated multiple times at specific human genetic loci. For more than 25 years, research into these disorders has focused on the anticipated effects of the expansion mutations based on whether the mutations lie within or outside annotated protein-coding regions. In 2011, Zu et al. (1) discovered that repeat expansion mutations can produce a set of unexpected mutant proteins in multiple reading frames without the canonical AUG initiation codon. The discoveries of repeat-associated non-ATG (RAN) 2 translation (1) and that repeat expansion mutations are often bidirectionally transcribed (2,3) mean that a single repeat expansion mutation can produce mutant pro-teins in all three reading frames from both sense and antisense transcripts (4 -6). The discovery that expansion mutations express proteins without a canonical AUG-initiation codon raises mechanistic questions about how RAN proteins are initiated. Given the prevalence of repeats in the human genome (7,8), RAN translation may increase the diversity and function of the proteome. This Minireview will discuss mechanistic insights and disease implications of RAN translation.
The discovery of RAN translation added a new twist to the already complex field of repeat-expansion disorders (9 -11). Unlike traditional mutations, expansion mutations are unstable and can change in length between generations (intergenerational instability) as well as within an individual (somatic instability) (12,13). Intergenerational instability can cause anticipation or a decrease in age of onset and an increase in disease severity from one generation to the next (14,15). Disease mechanisms in these disorders have traditionally been categorized based on the location of the expansion mutation within the corresponding gene. For example, expanded CAG repeats located within traditional protein-coding open reading frames (ORFs) have been considered to be caused by gain-of-function (GOF) effects of the corresponding mutant expansion protein (16) (e.g. Huntington's disease (HD)). In contrast, expanded mutations located in noncoding regions have been considered protein loss-of-function (LOF) (e.g. fragile X syndrome) or RNA GOF disorders (2,9,10) (e.g. myotonic dystrophy type 1 (DM1) or type 2 (DM2)). These "noncoding" expansion RNAs accumulate as nuclear foci, which sequester RNA-binding proteins (RBPs) and lead to a loss of their normal function (5,6). For example, in DM1 and DM2, CUG and CCUG expansion RNAs sequester MBNL proteins from their normal splicing targets, and MBNL LOF leads to alternative splicing dysregulation (7)(8)(9)(10). Although there is substantial evidence for both protein and RNA GOF mechanisms, the pathology and symptoms of these diseases are not fully explained by these mechanisms. For example, the restricted disease-specific populations of vulnerable neurons in the various diseases do not correlate well with the much broader CNS expression of the various expansion mutations. RAN translation and its downstream consequences offer new insights into disease mechanisms and more importantly new avenues for therapeutic interventions.

Translation and translational initiation
Although RAN proteins have been identified in a growing number of diseases (Table 1), very little is known about the underlying mechanisms of RAN translation. It is likely that RAN translation shares at least some features in common with canonical and/or internal-ribosome entry (IRES) initiation (Fig.  1). Canonical translation initiation is a complex process involving the stepwise activities of multiple protein complexes, including the following: 1) the recognition of the 5Ј-methyl-7guanosine cap; 2) the recruitment of the 43S preinitiation complex, which scans through the 5ЈUTR of an mRNA until an AUG codon in the proper context pairs with the CAU anticodon loop of the Met-tRNAi; 3) eukaryotic initiation factor 2 (eIF2) then hydrolyzes its bound GTP, and the 60S ribosomal subunit is recruited; and 4) the majority of eIFs are then released, and translation elongation begins (see Fig. 1A) (17)(18)(19)(20).
Although the majority of mRNAs are initiated by canonical scanning and AUG initiation, alternative initiation pathways have also been described. For example, many viral and a growing number of cellular mRNAs have been shown to use IRESs, complex RNA structures that direct ribosomal and eIF recruitment directly, for initiation (see Fig. 1B) (21). Close-cognate start codons, which differ from the AUG start codons by one nucleotide (22) and use the Met-tRNAi and methionine as the initiating amino acid, can also be used in mammalian cells. Starck et al. (23) showed that nonrepetitive CUG-codons initiate translation of the peptide-loaded major histocompatibility complex class I molecules (24) with Leu-tRNA Leu as the initiating tRNA. Although these alternative translation initiation mechanisms may share some similarities with RAN translation, RAN translation differs in that it is associated with repeat expansion mutations, can occur in the absence of close-cognate initiation codons, and can produce proteins in all three reading frames (Fig. 1C). Because RAN translation can also occur across repeats located in traditional ATG-initiated ORFs (1, 25), a single repeat expansion transcript may undergo ATG-initiated translation in one frame and RAN translation in the other two reading frames.

Discovery and initial characterization of RAN translation
The first evidence for RAN translation arose from studies attempting to separate the RNA and protein GOF effects in spinocerebellar ataxia type 8 (SCA8) (1). SCA8 is a dominantly inherited, slowly progressive neurodegenerative disorder caused by a CTG⅐CAG repeat expansion (26,27). Prior to the discovery of RAN translation, SCA8 was the only disorder in which both RNA and protein gain-of-function disease mechanisms had been implicated (28). Bidirectional transcription of the SCA8 expansion mutation produces CUG expansion transcripts that form RNA foci (29) and a CAG expansion transcript expressed in the opposite direction that encodes a nearly pure ATG-initiated polyglutamine (poly(Gln)) expansion protein (28). Surprisingly, mutating the ATG-initiation codon did not prevent expression of the poly(Gln) expansion protein (1). Zu et al. (1) demonstrated that CAG expansions lacking an ATG initiation codon can produce homopolymeric expansion proteins in all three reading frames (i.e. poly(Gln), poly(Ser), and poly(Ala)). Additional experiments showed the following: 1) no evidence of RNA editing that could have introduced a start codon; 2) frameshifting was not required for protein expression in multiple frames; 3) translation appeared by MS data to begin within the repeat itself, at least for the poly(Ala) frame; 4) mul- Table 1 Microsatellite repeats and their sense and antisense protein products Gly-Pro S/AS (33-35, 124, 191), Gly-Ala S (33)(34)(35)191), Gly-Arg S (33-35, 123, 191) Pro-Ala AS (35, 124), Pro-Arg AS (33-35, 123, 124, 191) SCA8 tiple RAN products from different frames can be produced in the same cell; 5) RAN translation is repeat length-dependent and favored by RNAs that form secondary structures; and 6) RAN proteins were shown to be toxic to cells. Subsequent analyses showed in vivo evidence that a novel SCA8 RAN poly(Ala) protein accumulates in cerebellar Purkinje cells in SCA8 mice and human autopsy tissue. More recently, Ayhan et al. (30) demonstrated that a novel RAN poly(Ser) protein expressed from ATXN8 CAG expansion transcripts accumulates in white matter regions in SCA8 mouse and human cerebella. Additionally, these authors showed that steady-state levels of SCA8 poly(Ser) and other RAN but not ATG-initiated proteins were reduced by knockdown of the initiation factor eIF3F. A novel CAG-encoded RAN poly(Gln) protein was also detected in myotonic dystrophy type 1 (DM1) mouse and human tissues, including patient myoblasts, skeletal muscle, and blood (1). Additionally, in both humans and mice, poly(Gln) aggregates were shown to co-localize with caspase-8 (1), an early indicator of poly(Gln)-induced apoptosis (31). The 2011 discovery of RAN translation in SCA8 and DM1 generated substantial interest by the scientific community into both the mechanisms of this novel type of promiscuous translation and the role of RAN proteins in neurodegenerative disease.
Since the discovery of RAN translation in SCA8 and DM1 (1), RAN proteins have been reported in fragile X tremor ataxia syndrome (FXTAS) (32), C9ORF72 ALS/frontotemporal dementia (C9-ALS/FTD) (33-35), Huntington's disease (25), and spinocerebellar ataxia type 31 (37). RAN translation has now been shown to occur across several different types of repeat motifs (CAG⅐CTG, CGG⅐CCG, G 4 C 2 ⅐G 2 C 4 , and TG 2 A 2 ⅐T 2 C 2 A), which share some common themes, including repeat length-dependence and the formation of unusual RNA secondary structures (38 -40). Given the potential impact of RAN translation in disease, much of the research on RAN translation has focused on the expression of RAN proteins and the characterization of their downstream consequences. This A, canonical translation initiation involves the binding of the 5Ј mRNA cap (eIF4F consisting of eIF4G, eIF4E, eIF4A, and eIF4B) and mRNA poly(A) tail (PABP), unwinding of the mRNA by the helicase activity of eIF4A, and recruitment of 43S complex (eIF5, eIF3, eIF2, and 40S ribosomal subunit) followed by scanning of the mRNA 5Ј UTR in a 5Ј to 3Ј direction by the engaged 43S complex. Recognition of the initiation codon results in the 48S initiation complex formation and displacement of several initiation factors. B, internal ribosome entry site initiation occurs in a cap-independent manner from multiple viral and cellular RNA sequences that involve the recruitment of the cellular 43S ribosomal complex to internal sites with the RNA by specific initiation translation factors (ITAFs). Depending upon the viral IRES group (I-IV), all, some, or none of the typical canonical translation factors, including the initiation codon, may be required for translation initiation. C, repeat-associated non-ATG translation initiation is a repeat-length-dependent process that allows for initiation at noncanonical codons either within or adjacent to the expanded repeat tract. Evidence from the FXTAS CGG repeats and some reports for G 4 C 2 repeats support a requirement for 5Ј mRNA cap, eIF4E, and eIF4A suggesting cap-dependent and scanning mechanisms. However, other reports support cap-independent translation initiation mechanisms more similar to IRES initiation. The identity and requirement of other cellular initiation factors involved in RAN translation have yet to be determined.
Minireview will focus on recent discoveries of RAN translation in disease and what is currently known about the mechanisms of RAN translation.

Fragile X tremor ataxia syndrome
Expansion of the FMR1 CGG repeat to between 55 and 200 repeats results in FXTAS, a late-onset disease primarily affecting males that is characterized by tremor, ataxia, parkinsonism, and cognitive decline (41). FXTAS patients have increased expression of the repeat-containing RNA (42, 43) and show ubiquitin-positive inclusions throughout the cerebral cortex, brainstem, and cerebellum (43, 44). Although some studies support the contribution of a toxic RNA GOF mechanism, the discovery and characterization of RAN proteins in FXTAS suggest RAN proteins also contribute to disease.
In 2013, Todd et al. (32) demonstrated that translation of expanded CGG repeats results in the expression of RAN proteins in the polyglycine (FMR-poly(Gly)) and polyalanine (FMR-poly(Ala)) but not polyarginine (FMR-poly(Arg)) reading frames in cell culture. The FMR-poly(Gly) protein has been shown to co-localize with the ubiquitinated inclusions previously reported in FXTAS patient brain samples (32) and in the ovaries of fragile X premutation ovarian insufficiency (FXPOI) patients (45). Krans et al. (46) showed in vitro evidence that polyproline (poly(Pro AS )), polyarginine (poly(Arg AS )), and polyalanine (poly(Ala AS )) expansion proteins are expressed across expanded antisense CCG transcripts. Similar to the sense FMR-poly(Gly) protein, both the poly(Pro AS ) and poly(Ala AS ) proteins were shown to accumulate in patient brains (46). Mechanistic studies by Kearse et al. (47) showed translation in both FMR-poly(Gly) and poly(Ala) reading frames depends on a 5Ј cap, eIF4E, and eIF4A, suggesting a cap-binding and scanning mode of translation initiation for those reading frames. Similar to RAN translation across a CAG repeat (1), steady-state levels of individual RAN proteins vary by reading frame. When fused with green fluorescent protein (GFP), the GFP-FMR-poly(Gly) fusion protein was observed with as few as 30 repeats, whereas GFP-FMR-poly(Ala) was not detected at lengths below 88 repeats (32). Insertion of stop codons into the upstream region of the FMR-poly(Gly) prevented protein expression, whereas a similar insertion did not prevent FMR-poly(Ala) expression. Additional luciferase experiments show that the FMR-poly(Gly) protein initiates at close-cognate AUG-like codons upstream of the CGG repeat (47), and this initiation occurs independent of the repeat tract itself (47). Taken together, these data suggest that RNA structures independent of the FMR repeat may promote initiation at multiple upstream non-AUG start codons in the poly(Gly) reading frame. Almost half of mammalian mRNAs harbor upstream ORFs (uORFs), many of which initiate from closecognate start codons (48 -52).
In contrast to FMR-poly(Gly) expression and similar to SCA8 poly(Ala) expression (1), mutagenesis experiments suggest the FMR-poly(Ala) frame may initiate from within the repeat expansion (47) as stop codons inserted before the repeat did not prevent expression. Taken together, these results suggest RAN translation can differ mechanistically in different reading frames (53). Translation initiation in the FMR-poly(Gly) frame has more in common with initiation at close cognate uORFs than with the more permissive characteristics of RAN translation observed in the FMR-poly(Ala) frame and in other repeat expansions disorders. RAN translation has been shown to occur in multiple reading frames, even when expansion mutations are located within larger ORFs (e.g. HD) (25). Similarly, RAN translation in the FMR poly(Ala) reading frame occurs along with close-cognate uORF expression in the poly(Gly) frame (1,25). Taken together, these results highlight the complexity of translation mechanisms at repeat expansion loci.
The role of the FMR-poly(Gly) protein and the ubiquitinpositive inclusions was further characterized using inducible mouse models with 90 CGG repeats (54,55). Hukema et al. (54) showed that turning off expression of the CGG transgene and therefore FMR-poly(Gly) expression in these mice reduced the number of ubiquitin and FMR-poly(Gly) inclusions at 8 weeks and halted deterioration of eye movement abnormalities suggesting a pathogenic contribution of the FMR-poly(Gly) protein. These mice were more recently used to examine anxiety, motor coordination deficits, and impaired gait, in which ablation of CGG transgene expression rescued behavioral but not motor phenotypes (55). Behavioral features in this model paralleled the formation of intranuclear inclusion in various brain regions (55). More recently, Sellier et al. (56) demonstrated that expression of FMR-poly(Gly) is pathogenic, whereas the sole expression of CGG RNA is not. Poly(Gly) inclusions have also been observed in the ovaries of FXPOI patients as well as in ovaries of older (40 weeks) but not younger (20 weeks) knockin CGG mice (45). These data suggest FMR-poly(Gly) may contribute to premature ovarian insufficiency.

Huntington's disease: RAN translation in an ORF
HD is a relentlessly progressive neurodegenerative disorder characterized by movement abnormalities, cognitive decline, and psychiatric problems (11). Most HD patients have expanded CAG repeats in the 38 -55 repeat range and develop symptoms during middle age, but larger repeats (Ͼ60 CAG) cause a juvenile onset form of the disease (57). Most research into HD and other poly(Gln) disorders has focused on understanding the toxic effects of the poly(Gln) expansion proteins (58). Although RAN translation had been reported in a number of noncoding disorders, the location of the CAG repeats within canonical open reading frames and their smaller size made it unclear whether these active open reading frames also produce RAN proteins in alternative reading frames. Bañez-Coronel et al. (25) tested whether RAN translation can occur in a polyglutamine disease by examining the most common of these disorders, Huntington's disease. The authors showed that four novel homopolymeric RAN expansion proteins (poly(Ala) and poly-(Ser) from the HTT sense transcript and poly(Leu) and poly-(Cys) from the HTT antisense strand) accumulate in HD human autopsy brains. These proteins accumulate in affected brain regions, including the striatum and frontal cortex, and in regions with neuronal loss, microglial activation, and apoptosis. Some regions, including the caudate/putamen, showed both poly(Gln) and RAN protein staining, and other regions, including caudate and putamen white matter bundle regions, showed RAN but not poly(Gln) protein staining (25). These data suggest RAN proteins may play a role in previously described HD white matter abnormalities (59 -62). Additionally, Bañez-Coronel et al. (25) found evidence for robust RAN protein but minimal poly(Gln) accumulation throughout the degenerating cerebellar layers of juvenile onset HD cases with severe cerebellar atrophy. The region-specific accumulation of HD-RAN proteins could indicate that the degradation pathways that handle RAN proteins and/or the process of RAN translation itself varies in efficiency between different cell types. Taken together, these data suggest RAN proteins play a role in the neurodegenerative changes and white matter abnormalities in HD (59 -62).

Myotonic dystrophy type 2
In 2017, Zu et al. (63) showed that the myotonic dystrophy type 2 (DM2) intronic CCTG expansion mutation located in the cellular nucleic acid-binding protein (CNBP) gene can undergo both bidirectional transcription and RAN translation. DM2 (64) is a multisystemic disorder clinically similar to myotonic dystrophy type 1, which includes a late-onset CNS phenotype involving executive function deficits and white matter abnormalities (65,66). Although the DM2 expansions produce the same tetrapeptide repeat motifs in all three reading frames: leucine-proline-alanine-cysteine (LPAC) in the sense direction and glutamine-alanine-glycine-arginine (QAGR) in the antisense direction (63), the C-terminal regions in each reading frame differ, resulting in the expression of six unique proteins. The LPAC and QAGR RAN proteins accumulate in DM2 autopsy brains in distinct patterns, with LPAC primarily found in gray matter and QAGR in white matter regions of the brain. Codon-replacement studies show LPAC and QAGR RAN proteins are toxic-independent of CCUG-or CAGG-induced RNA GOF effects. These authors also showed that the nuclear sequestration of CCUG expansion transcripts into RNA foci decreases the steady-state levels of the LPAC RAN proteins. These data support a nuclear sequestration failure model in which RNA GOF effects predominate until the capacity to sequester expansion RNAs in the nuclei is exceeded and expansion RNAs are exported to the cytoplasm (sequestration failure) where they undergo RAN translation. This model predicts a highly variable pattern of RNA foci and RAN protein accumulation depending on the capacity of individual cells to sequester expansion RNAs and/or undergo RAN translation.

Spinocerebellar ataxia type 31: A pentanucleotide repeat
Spinocerebellar ataxia type 31 (SCA31) is an autosomaldominant disease caused by expansion of a complex pentanucleotide repeat. The repeat tract, which includes TGGAA, TAGAA, TAAAATAGAA, and TAAAA repeat motifs, is located within an intron shared by the NEDD4-1 (BEAN1) and thymidine kinase 2 (TK2) genes (67). The presence of UGGAAcontaining RNA foci in Purkinje cell nuclei of SCA31 patients but not controls supports a toxic RNA GOF mechanism (68). This repeat expansion is also translated into pentapeptide repeat (PPR) proteins (poly(Tyr-Asp-Gly-Met-Glu)) that accumulate in both SCA31 patient brain autopsy tissue and Drosophila models (37). Because AUG-initiation codons are embedded within the repeat tract, it is unclear whether ATGinitiated or RAN translation is responsible for protein production. In either case, PPR protein production is repeat length-dependent as these proteins are only detected in SCA31 patients and fly models with repeat expansions (37). Several RBPs, including TDP-43, were shown to suppress RNA foci formation and PPR protein accumulation (37). For TDP-43, inhibition of RNA aggregates occurred in an ATP-independent manner in vitro (37), suggesting that RBPs, like TDP-43, may play a role in RNA quality control and/or regulation of translation. Another AT-rich repeat, an (ATTTC) insertion in the noncoding region of the DAB1 gene, has been associated with SCA37 (69) and shows RNA aggregates in human cells overexpressing (ATTTC) 58 but not in (ATTTT) 139 repeats. Although it remains to be determined whether bidirectional transcription and/or RAN translation occurs in SCA37, future experiments with these AT-rich repeats will provide insight into the mechanisms of neurodegenerative disorders.

Fuchs endothelial corneal dystrophy (FECD)
The most recent repeat expansion disorder shown to express RAN proteins is FECD (70). FECD involves the slowly-progressive degeneration of the corneal endothelium, which ultimately results in vision loss. An intronic CTG repeat located in the third intron of the transcription factor 4 (TCF4) gene was shown to be a common genetic cause of FECD (71). The CTG expansion mutation results in RNA foci and RNA splicing defects in endothelial cells from FECD patients, supporting an RNA GOF mechanism for this disease (72,73). Soragni et al. (70) recently demonstrated that the intronic CTG⅐CAG expansion in TCF4 undergoes RAN translation in transfected cells and that the resulting antisense poly(Ser) and poly(Gln) RAN proteins are toxic to immortalized corneal endothelial cells. Additionally, the authors developed a C-terminal antibody against the sense poly(Cys) expansion protein expressed from the TCF4 CUG expansion RNAs and provided evidence for the accumulation of a poly(Cys) protein in patients' endothelial samples (70). These data provide another opportunity to understand the mechanisms of RAN translation and the tissuespecific pathogenic consequences of the protein products of a repeat expansion.

C9ORF72 ALS/FTD: Accelerating the pace of RAN translation discovery
The 2011 discovery that the most common known genetic forms of ALS and FTD are caused by a hexanucleotide (G 4 C 2 ) expansion in the C9ORF72 (74, 75) gene has raised an enormous level of scientific interest because it linked the microsatellite expansion field to more common neurodegenerative diseases like ALS and dementia. C9-ALS/FTD mutation carriers may develop ALS, which causes upper and lower motor neuron loss and muscle atrophy, typically leading to death from respiratory failure within 3-5 years of onset (76). This mutation can also result in FTD, a disease characterized by behavioral and personality changes with language dysfunction followed by dementia later in the disease (77, 78). Disease mechanisms previously described in other repeat expansion diseases (4, 79) have been proposed for C9-ALS/FTD (80, 81), including the following: 1) protein LOF due to C9ORF72 protein haploinsufficiency (74,75,82); 2) RNA GOF and RNA processing abnormalities caused by sequestration of one or more RNA-binding proteins to C9-expansion RNAs (79, 84 -89); and 3) RAN protein toxicity (33)(34)(35). Bidirectional transcription, another common feature of expansion mutations (2,28,90,91), is also found in C9-ALS/FTD. Both sense and antisense transcripts accumulate as RNA foci and produce RAN proteins (sense: poly(Gly-Ala) (GA), poly(Gly-Arg) (GR), and poly(Gly-Pro) (GP); and antisense: poly(Gly-Pro) (GP), poly(Pro-Arg) (PR), and poly (Pro-Ala) (PA)) that accumulate in patient autopsy brains (33)(34)(35).
The mechanism of translational initiation from the C9ORF72 repeat expansion has been the subject of investigation. Using a cell-free in vitro translation system, Tabet et al. (92) showed that translation from the expanded G 4 C 2 transcript operates via a 5Ј-3Ј cap-dependent scanning mechanism that utilizes an upstream CUG codon, eIF4E, and is regulated by uORF. Similarly, Green et al. (93) also showed that translation across G 4 C 2 repeat expansions is cap-and eIF4A-dependent and utilized the same near-cognate initiation codon. In contrast, data generated using cell-based studies by Cheng et al. (94) support cap-independent translation initiation for C9ORF72 expansion transcripts. Data from this cellular system support a model in which translation initiation occurs on uncapped spliced repeat-containing intronic RNA following export to the cytoplasm. Similar to IRES-driven translation, C9 cap-independent translation was shown to be up-regulated by ER stress pathways, through eIF2␣ phosphorylation. Cheng et al. (94) and Green et al. (93) speculate that disease is exacerbated by a feed-forward mechanism in which RAN proteins increase ER and oxidative stress, which leads to increased eIF2␣ phosphorylation and RAN protein expression. Additionally, increased R-loop formation and double-strand breaks as well as defective ataxia telangiectasiamutated (ATM)-mediated repair associated with C9-expanded repeats (95) may also factor into this process by promoting intracellular stress.
In support of a possible role for protein loss-of-function mechanisms in C9-ALS/FTD, lower levels of C9ORF72 protein are observed in C9-ALS/FTD patients (75,82,87,96,97). Arguments against a role for this mechanism include that there are no known cases of ALS or FTD patients with null or missense C9ORF72 mutations. Furthermore, C9ORF72 knockout mice display an altered immune response but do not develop motor neuron degeneration or other features of ALS or FTD (98 -103), suggesting loss-of-function of C9ORF72 is not a primary driver of disease. Data supporting an RNA gain-of-function mechanism includes the accumulation of both sense and antisense RNA foci in C9 patients and various model systems (35, 75, 104), splicing defects in C9 patient cells (105)(106)(107), and the in vitro identification of multiple potential C9ORF72 RNAbinding proteins (96, 104, 108 -116). These findings, particularly the C9-associated RNA-binding proteins, have shown considerable variability leaving the contributions of specific RBPs in disease unclear. There has also been considerable research focused on understanding the role of RAN proteins in C9-ALS/FTD. Both the sense and antisense dipeptide repeat (DPR) or C9-RAN proteins accumulate as aggregates in the autopsy tissue of C9 patients. DPR proteins are a hallmark feature of C9-ALS/FTD and have been found in neurons and glia throughout the brain and spinal cord (34,111,117). Although there is general agreement that RAN proteins are toxic in model systems, the contribution of individual RAN proteins to disease is the subject of debate.
When overexpressed or delivered to cultured cells or animal models, C9-RAN proteins are generally toxic (33-35, 84, 87, 116, 118 -124) with the arginine-containing proteins (poly(PR) and poly(GR)) showing the strongest toxicity in most studies (84,119,120,125). PR and GR proteins interact with nuclear proteins causing splicing aberrations (118,126), nucleolar stress (118,127), abnormal stress granule formation (120), and translational dysregulation (120,128). PR and GR proteins interact with other low complexity domain proteins, altering the physiology of phase separation and impairing the assembly and function of membrane-less organelles (125,126). PR and GR RAN proteins have also been shown to associate with the U2 small nuclear ribonucleoprotein resulting in its cytoplasmic mislocalization and the blockage of spliceosome assembly and splicing (129), including genes related to mitochondrial, neuronal, and pre-mRNA splicing function. The GA RAN protein, which is moderately toxic in cell culture (116,121,122), was also shown to be toxic in zebrafish (130) and to induce neurodegeneration when overexpressed by AAV delivery in mice (116). GA proteins have been reported to interact with components of the ubiquitin proteosome system (UPS) and UPS-related proteins (116,131) and to cause disruptions in the UPS system (116,121,122). Additionally, long poly(GA) 80 proteins have been shown to recruit poly(GR) 80 into cytoplasmic inclusions and thereby partially decrease GR-induced toxicity and Notch signaling defects (132). Recently, an elegant 3D cryoelectron tomography study by Guo et al. (133) showed that poly(GA) proteins form aggregates that selectively trap macromolecular complexes, including proteasomes, which may contribute to protein homeostasis problems. Interestingly, poly(Gln) aggregates have a fibril-like structure and cause vesicle and ER membrane deformation (134). It is important to note that the majority of these toxicity studies have utilized relatively short repeats (Ͻ90 repeats) compared with the hundreds or thousands of repeats found in C9-ALS/FTD patients. The length of the repeat tract can influence a number of events, including nuclear/cytoplasmic localization (118,132,135), inclusion formation (121), and toxicity (118,128). Understanding the structure, behavior, and toxicity of additional types of RAN proteins found in patients, where repeats can be hundreds or thousands of units long, will be important for understanding the pathogenic mechanisms of C9-ALS/FTD and other RAN protein diseases.

Nucleocytoplasmic transport
The nuclear pore complex and nucleocytoplasmic transport deficits have been associated with several repeat expansion disorders. Chromosomal DNA with expanded CAG repeats associates with nuclear pores in yeast and NPC-associated factors (136). In DM1, the transcription factor SHARP (SMART/ HDAC1-associated repressor protein) is mislocalized to the cytoplasm, due to increased CRM1-mediated export, although the nucleocytoplasmic shuttling of other CRM1-mediated targets is not affected (137). In HD, the HTT protein contains a highly conserved nuclear export signal (138), and NPC components have been detected in synthetic poly(Gln) aggregates (139). Huntingtin N-terminal poly(Gln) fragments reduce huntingtin interaction with the nuclear export protein, translocated nuclear pore (TPR) (140). Additionally, poly(Gln) expansion proteins from repeat expansion diseases have reduced rates of nuclear export (141,142) or nuclear pore abnormalities (143,144). These older studies were conducted prior to the discovery of RAN translation and warrant reinvestigation to examine the potential contribution of RAN proteins to alterations in the nuclear pore complex and nucleocytoplasmic transport. Grima et al. (145) recently demonstrated severe mislocalization and aggregation of nucleoporins (NUPs) and defective nucleocytoplasmic transport can be induced by a mixture of HD-RAN proteins and mediated by RanGAP. Another group, Gasset-Rosa et al. (146), found, using HD mouse models, multiple nuclear membrane or NPC defects, including aggregated nuclear pore factors that appeared to colocalize with mouse HTT aggregates. In another study, cytoplasmic, but not nuclear, aggregates of amyloid-like proteins, including mutant huntingtin, interfered with nucleocytoplasmic transport of both proteins and RNA (147). Expression of the FMRpoly(Gly) has also been linked to altered nuclear lamina architecture that may contribute to FXTAS (56).
In C9ORF72 ALS/FTD, both the direct interaction of C9ORF72-RNA fragments with nuclear pore complex proteins (86) and the disruption of NPC function by RAN proteins (85, 148 -150) have been suggested as neurodegenerative pathways. Additional support for the role of nucleocytoplasmic transport deficits comes from the observation that misregulated nuclear transport factors are found in multiple forms of ALS and FTD patient autopsy material and patient-derived induced pluripotent stem cell neurons (86,(151)(152)(153). TDP-43 pathology also triggers structural defects in the NPC and nucleocytoplasmic transport across multiple types of ALS/FTD (154). Several studies have identified nucleocytoplasmic transport components, including RanGAP (86), as suppressors or enhancers of disease using Drosophila (84,86), yeast (85), and siRNA-based human cell (155) screens. Additionally it has been shown that PR interacts directly with the nuclear pore by plugging it and thereby hindering nuclear export (156). RAN poly(PR) peptides hinder nuclear export, likely by binding directly to the central channel of the nuclear pore, through a direct interaction between poly(PR) and nuclear pore proteins enriched in phenylalanine/glycine repeats (156). Although nuclear pore pathology is a common in many neurodegenerative diseases, further studies are needed to understand whether these deficits are a cause or consequence of other cellular problems.

Mouse models
Although there has been considerable progress in understanding C9-ALS/FTD from the analysis of cell culture, simple animal models, and patient-derived tissues, the progress on the development of transgenic mouse models has been slow until recently (98 -103, 157-162). Because the regulation of the sense and antisense genes is complex, several groups decided to generate BAC transgenic models of C9ORF72 ALS/FTD to allow expression of these overlapping genes to be driven by their endogenous human promoters. Whereas animals from all four different BAC transgenic models produce RAN proteins and RNA foci, disease presentation varies greatly. Two groups did not observe behavioral or pathological phenotypes (157,158). Mice developed by a third group showed spatial learning and working memory deficits with mild hippocampal degeneration, but not the severe neurodegenerative phenotypes found in patients with C9ORF72 ALS/FTD (159). A more severe phenotype was observed in a BAC transgenic mouse model developed by a fourth group (160). These mice show classic features of both ALS and FTD, including decreased survival, paralysis, muscle denervation, motor neuron loss, anxiety-like behavior, and cortical and hippocampal neurodegeneration. Although both sense and antisense foci are found in these mice, antisense foci preferentially accumulate in ALS/FTD-vulnerable cell populations. As these animals age and the disease progresses, RAN protein accumulation increases, with end-stage animals displaying the typical TDP-43 inclusions in degenerating regions of the brain. A recent transgenic mouse model, expressing poly(GA) protein independent of RNA GOF effects, demonstrated accumulation of the poly(GA) protein results in mild motor phenotypes, including gait and balance abnormalities, but not the overt phenotypes typical of ALS/FTD, including paralysis and death (163). Similar to other models, RAN protein inclusions develop before symptoms appear in the poly(GA) mice (163), supporting the idea that the molecular effects of these mutations precede overt disease phenotypes. Mouse models with ALS/FTD phenotypes are critical for the development and testing of therapeutic strategies. Additionally, understanding and comparing the molecular differences between phenotypic and nonphenotypic models should provide insight into disease modifiers.

Therapeutic approaches
The plethora of potential pathogenic elements associated with expanded repeats (Fig. 2) makes it complicated to develop and assess therapeutic interventions. Antisense oligonucleotides (ASOs), which mediate cleavage of target RNAs via nuclear RNase H, have been or are currently in clinical trials for targeting the sense transcripts for both HD (164,165) and DM1 (166,167). It is important to note that in these trials, the antisense transcripts and/or RAN proteins would not be downregulated. ASO strategies are also being developed for other expansion disorders, including SCA2, SCA3, and C9ORF72 (168,169). A single dose of sense-transcript targeting ASOs administered in a C9-BAC mouse model decreased expanded C9ORF72 transcript levels, sense foci, and both poly(GA) and poly(GP) protein levels (159). Although ASO injections in older mice improved cognitive test performance, it is important to note that these mice do not develop TDP-43 inclusions or the motor neuron loss characteristic of C9ORF72 ALS/FTD (159). Interestingly, beneficial effects in treated mice were observed 6 months following injection, at a time when the expanded RNA transcript levels were no longer reduced and poly(GP) and poly(GA) levels remained lower (159). An alternative ASO approach has utilized the knockdown of the SUPT4H1-SUPT5H transcriptional elongation factor complex, which reduces transcription of genes with long stretches of expanded repeats without genome-wide changes in the expression of other RNAs (170,171). Reducing SUPT4H1 levels by either ASO knockdown or genetic deletion results in reductions in mutant-expanded HTT mRNA, Htt aggregates, and phenotypic recovery (172). A similar knockdown of SUPT5H in patientderived C9ORF72 cells reduced both sense and antisense RNA foci as well as poly(GP) protein without large-scale changes in other transcripts (170). While promising, this treatment strategy does not reduce expanded repeat transcript levels as much as direct targeting with ASOs (100,159,170), and SUPT4H1/ SUPT5H alters the expression of a number of other genes (170) that may have deleterious consequences.
Additional approaches, including the use of small molecules, are being applied to repeat expansion disorders. These strategies include blocking transcription (173,174) and targeting the expansion RNAs (175)(176)(177)(178)(179)(180)(181) or downstream cellular processes (122,(182)(183)(184)(185)(186). Targeting nucleocytoplasmic transport defects associated with repeat diseases has also been shown to be neuroprotective in both C9ORF72-ALS models (86,187) and HD mouse models (145). Depletion of SRSF1, which inhibits nuclear export of C9ORF72 expansion transcripts by preventing its interaction with its nuclear export receptor, has been shown to prevent neurodegeneration and locomotor deficits in flies (187). In contrast, approaches that focus on inhibiting sequestration of expansion transcripts by RBPs may allow expansion transcripts to be exported to the cytoplasm, undergo RAN translation, and further exacerbate the disease (63). Additional efforts have focused on targeting RAN proteins, and anti-GA antibodies have been shown to inhibit intracellular poly(GA) aggregation in cell culture and to block seeding activity in brain extracts (188). Overexpression of the small heatshock protein B8 (HSPB8), which modulates autophagy-mediated disposal of misfolded aggregation-prone proteins (189), was recently shown to decrease the accumulation of most C9-RAN proteins (190). Targeting RAN proteins directly has yet to be tested for therapeutic potential in more complex disease models but doing so may also provide clues to the pathogenic role of RNA GOF versus RAN translation. A better understanding of the mechanisms of RAN translation and the role of individual RAN proteins in disease is likely to provide novel therapeutic opportunities. RAN proteins have now been reported in eight repeat expansion disorders with different repeat motifs, pathogenic thresholds, and disease presentations. Although significant progress has been made in understanding the role of RAN proteins in disease, additional insights into the mechanisms of RAN translation will facilitate the identification of new therapeutic targets and advance our understanding of cell biology and protein An illustration of the three nonexclusive disease mechanisms proposed for most microsatellite expansion disorders, using C9ORF72 ALS/FTD as an example, is shown. A, microsatellite repeat expansion mutation (G 4 C 2 ⅐G 2 C 4 for C9-ALS/FTD) results in transcriptional inhibition and/or epigenetic silencing that reduces the levels of the resulting protein product (75,82,83). B, expansion mutations produce up to six toxic RAN proteins from both sense and antisense mutant transcripts. These proteins disrupt normal cellular functions (e.g. nucleocytoplasmic transport) and/or overwhelm cellular coping mechanisms (e.g. protein homeostasis). In C9-ALS/FTD protein GOF effects lead to nucleolar dysfunction, ER stress, altered autophagy, cell to cell transmission of RAN proteins, nucleocytoplasmic transport, and nuclear envelope deficits (33)(34)(35)(36). C, expansion RNAs sequester RBPs into nuclear foci reducing RBPs' availability and decreasing its normal function. Expansion transcripts may also interact with and disrupt the function of other cellular components, such as proteins of nuclear pore complex (86). Although the identity and altered function of the RBP protein for C9ORF72 hexanucleotide repeats are the subject of much debate (83,96,108,110,(113)(114)(115), RNA GOF effects are well established for DM1 CUG repeats that sequester MBNL proteins (79, 84 -88). Different therapeutic approaches (purple boxes) target the various expansion RNA and protein products. ASOs and small molecules (SM) have been used to target either the sense or antisense expansion RNAs, although the effect on the opposite strand is unclear. Alternatively, ASOs can target transcription of the expanded repeat, e.g. SUPT5H (170). Additionally, therapeutic approaches, including small molecules, have been aimed at the downstream consequences of the expansion mutations, such as increasing or improving protein clearance mechanisms. Antibodies against RAN proteins (188) (Ab) or overexpressing proteins involved in autophagy (190) are also therapeutic approaches.

Summary and future directions
translation. Although the structure, function, and C-terminal regions of individual RAN proteins differ and warrant independent consideration, targeting RAN translation could provide a single therapeutic strategy likely to impact an entire category of repeat expansion diseases. Additionally, the development of new tools and strategies to study RAN translation and RAN proteins are needed, along with models that mimic the full spectrum of molecular, pathological, and behavioral features of disease seen in patients. In summary, RAN translation is a complex biological process that we are only beginning to understand. Given the prevalence of repetitive elements in the human genome, RAN translation is likely to be found in additional diseases and possibly also contributes to normal cellular biology.