Influenza A Virus Polymerase: Structural Insights into Replication and Host Adaptation Mechanisms*

The heterotrimeric RNA-dependent RNA polymerase of influenza viruses catalyzes RNA replication and transcription activities in infected cell nuclei. The nucleotide polymerization activity is common to both replication and transcription processes, with an additional cap-snatching function being employed during transcription to steal short 5′-capped RNA primers from host mRNAs. Cap-binding, endonuclease, and polymerase activities have long been studied biochemically, but structural studies on the polymerase and its subunits have been hindered by difficulties in producing sufficient quantities of material. Recently, because of heightened effort and advances in expression and crystallization technologies, a series of high resolution structures of individual domains have been determined. These shed light on intrinsic activities of the polymerase, including cap snatching, subunit association, and nucleocytoplasmic transport, and open up the possibility of structure-guided development of new polymerase inhibitors. Furthermore, the activity of influenza polymerase is highly host- and cell type-specific, being dependent on the identity of a few key amino acid positions in the different subunits, especially in the C-terminal region of PB2. New structures demonstrate the surface exposure of these residues, consistent with ideas that they might modulate interactions with host-specific factors that enhance or restrict activity. Recent proteomic and genome-wide interactome and RNA interference screens have suggested the identities of some of these potential regulators of polymerase function.

Influenza A viruses are important viral pathogens of humans and animals. In humans, they cause both yearly seasonal influenza epidemics and more extensive global outbreaks termed pandemics. The 1918 pandemic killed 50 million people, and those in 1957 and 1968 also caused serious mortalities. Although of relatively low virulence, the current swine origin H1N1 pandemic has shown that outbreaks can occur suddenly and unexpectedly despite constant worldwide surveillance. Influenza is also a major animal pathogen affecting domestic poultry and pigs, resulting in significant economic impact with, for example, sudden loss of whole flocks from sickness or preventative culling. All influenza A viruses originate from wild waterfowl that are generally asymptomatic during infection. From this origin, they can infect humans, either directly or via a domestic animal intermediate such as poultry or pigs. The infection of humans by avian viruses does not result in a sustainable pathogen because of poor human-to-human transmission; however, there are continual concerns that avian viruses such as the current highly virulent H5N1 avian strain may adapt and become a serious threat.
The influenza A virus is a member of the Orthomyxoviridae family possessing a negative-sense single-stranded RNA genome that is divided into eight viral RNA (vRNA) 2 genomic segments that encode 10 major proteins (1). The three largest vRNAs encode the three subunits of the RNA-dependent RNA polymerase: the acidic subunit PA and the two basic subunits PB1 and PB2. There are also two minor pb1 gene products: an N-terminally truncated form of PB1 originating from an alternative start codon (2) and PB1-F2, a short polypeptide expressed from an alternative ϩ1 reading frame that seems to increase the virulence of some strains (3). The medium segments encode the nucleoprotein (NP) that, together with the polymerase subunits and the vRNA, forms the ribonucleoprotein (RNP) and the two viral glycoproteins (4) hemagglutinin (HA) and neuraminidase (NA). HA is the major protein on the viral surface, and it binds cellular sialic acid receptors, leading to virus uptake. Once in the cell, fusion of viral and endosomal membranes occurs, and RNPs are released into the cytoplasm. NA is the enzyme that cleaves these same sugars from the surfaces of cells and new viruses during viral budding. There are two short vRNA segments that each encode two proteins. The first generates matrix protein (M1) that lines the internal surface of the viral lipid membrane and an ion channel (M2) that mediates the uncoating of the viral particle during infection and is the target for the drug amantadine. The second short segment encodes NS1, which is a significant virulence factor involved in evasion of the innate immune system (1), and NS2 (also known as nuclear export protein), which exports viral RNPs from the nucleus into the cytoplasm (5).
The influenza polymerase has no proofreading activity, resulting in a high gene mutation rate of approximately one error per replicated genome (6), so each cell can produce 10,000 new viral mutants to infect neighboring cells. This is crucial to the virus's evolutionary strategy, as continual changes in the glycoprotein sequences, notably HA, lead to evasion of the host antibody response, so-called "antigenic drift," which underlies the inevitable escape from seasonal vaccines. Likewise, mutations in genes of the M2 and NA proteins have rapidly rendered * This work was supported by European Union FLUPOL Contract SP5B-CT-the antiviral drugs amantadine and oseltamivir inactive against these targets. The polymerase itself is the target for new antiviral drugs (7), including T-705, an inhibitor in current late stage development (8). Experience suggests that, under selective pressure from drug use, the error-prone aspect of the very activity against which the inhibitors are targeted will lead to resistance. The segmented structure of the genome also contributes to the rapid evolution of new influenza viruses through a process called reassortment, where segments from different co-infecting viruses are packaged into new viral particles. The generation of reassortment viruses containing genes from avian and human viruses is thought to arise most commonly in pigs because these animals are susceptible to both viral types due to the presence of avian-like ␣2,3-linked and human-like ␣2,6linked sialic acid cell-surface receptors. Through reassortment, an avian virus may evolve suddenly into a human pathogen by combining polymerase subunits and HA (and also other viral proteins, e.g. NS1) that function efficiently in human cells (9). If the HA protein is immunologically distinct from circulating varieties, "antigenic shift" can occur, resulting in a pandemic strain to which the population has no immunological protection. The current novel pandemic H1N1 strain arose through such a triple-reassortment process, combining avian PB2 and PA polymerase subunits, human PB1, and classic swine HA (10).
The influenza polymerase is therefore essential to the biological processes of (i) virus replication in cells by replicating the vRNA segments and transcribing their genes and (ii) virus evolution through its error-prone RNA replication, producing variants of the viral proteins, including the glycoproteins and the polymerase subunits themselves, which leads to viruses that are better adapted to new host species (11). Additionally, and in common with other viral replicases, it represents a promising drug target due to its activities that are distinct from those found in the host cell (7). Yet despite its biological interest and medical importance, the absence of detailed structural information on the polymerase has limited our mechanistic understanding and our ability to design better drugs. The main reason for this absence is an overwhelming difficulty in producing purified polymerase proteins in sufficient quantities for study. Since 2007, because of heightened effort and advances in expression and crystallization technologies, a series of x-ray and NMR structures of domains from the PA, PB1, and PB2 subunits that cover approximately half of the trimeric complex have been determined. Here, we review this structural progress and discuss our improved understanding of the intrinsic polymerase mechanism, as well as its role in adaptation to the cellular environments of different host species.

Genome Replication and Transcription Activities of the Polymerase
In the intact viral particle, each RNP contains a single polymerase complex associated with the conserved 5Ј-and 3Ј-ends of each vRNA segment. The vRNA is complexed with NP, with each protomer contacting 24 nucleotides (12). Direct protein contacts also occur between NP and the polymerase, as suggested by several biochemical studies (13)(14)(15)(16)(17) and visualized recently in a high resolution cryo-electron microscopy recon-struction comprising a synthetic minimal vRNA, nine NP monomers, and a polymerase trimer (18). The pre-existence of a functional polymerase trimer in the infective particle is necessary to initiate the first transcription and RNA replication cycles because the negative-sense vRNA cannot be directly translated into protein. Once de novo polymerase subunits are synthesized in the cytoplasm, they are transported back into the nucleus for assembly into trimers (19), thus allowing further transcription and genome replication cycles.
Replication of the viral genome is catalyzed by the polymerase via a cRNA intermediate. This is then copied back into vRNA segments that are used as templates for transcription and further replication in the nucleus and are exported as RNPs to the cytoplasm, where they are sorted into new viral particles via recognition of terminal packaging sequences (20,21). In transcription, mRNA is generated by directly copying from the vRNA template, with polyadenylation occurring via a stuttering mechanism on a templated oligo(U) terminal sequence. Unlike RNA replication, the process is primer-dependent, and the virus obtains the 5Ј-primer by a mechanism shared with other segmented RNA viruses (e.g. bunyaviruses) called "cap snatching" (Fig. 1) (22). The polymerase binds the 5Ј,7-methylguanosine cap of a nuclear pre-mRNA and cleaves it 9 -15 nucleotides downstream. The resulting RNA oligonucleotide is used to initiate transcription from the vRNAs, resulting in capped, polyadenylated, positive-sense mRNAs that resemble host cell messages. These are exported from the nucleus for translation in the cytoplasm, possibly via a CRM1-independent pathway (23).
PA Subunit-The PA subunit has no significant homology to other proteins, and for a long time, the function was unclear. Various functions were proposed, including a chymotrypsinlike serine protease (24 -26) and various aspects of RNA replication (27)(28)(29)(30). A soluble subunit could be expressed in insect cells for limited proteolysis studies, revealing cleavage into FIGURE 1. Cap-snatching transcription mechanism of influenza polymerase. The PA-PB1-PB2 complex is localized in the nucleus of the infected cell. During transcription, the PB2 subunit binds the 5Ј,7-methylguanosine cap of a host pre-mRNA molecule (red), which is subsequently cleaved 10 -15 nucleotides downstream by the PA endonuclease. The resulting short capped RNA primer is used to initiate polymerization by the RNA-dependent RNA polymerase of the PB1 subunit using 5Ј-and 3Ј-bound vRNA (green) as template, resulting in capped, polyadenylated, chimeric mRNA molecules (red and blue) that are exported to the cytoplasm for translation into viral proteins.
N-terminal 25-kDa and C-terminal 55-kDa fragments, indicative of stable domains (31,32). The structure of the C-terminal domain complexed with a short PB1 N-terminal peptide was determined by two groups (33,34), showing how the PB1 fragment is gripped in a highly conserved cleft resembling "jaws" in the "head of a dragon" (Fig. 2) (33). The essential nature of this interaction was demonstrated by mutagenesis of the interface (34), and sequence analysis revealed how the interfacial residues were resistant to mutational drift, probably because multiple compensatory mutations in both PA and PB1 subunits would be required. This interface has thus been proposed as possibly druggable, and a 25-residue PB1-derived peptide has been shown to inhibit polymerase assembly and virus replication in influenza A and B strains (35,36).
Two groups recently determined the structure of the PA N-terminal domain (37,38). Only at this point did it become clear that the endonuclease activity was present in this subunit and not PB1 as suggested previously from work on purified polymerase complexes (39). The domain fold and active-site arrangement is similar to that of the PD-(D/E)XK family of nucleases (Fig. 3a). That the crystallized domain possessed the true endonuclease activity was confirmed by the hydrolysis of both single-stranded RNA and single-stranded DNA substrates (37), as observed previously with purified RNPs (40). Structure-based mutagenesis of key PA active-site residues in reconstituted polymerase trimer showed that endonuclease-independent RNA replication was maintained, whereas endo-nuclease-dependent transcription was abolished (38). A known specific inhibitor of influenza endonuclease activity in intact polymerase, dioxo-4-phenylbutanoic acid (41), strongly stabilized the purified domain in thermal shift assays and inhibited single-stranded RNA hydrolysis (37). The identity of the coordinated metal ions differs between the two structures, with one structure containing a single Mg 2ϩ ion (38) and the other containing two Mn 2ϩ ions coordinated in adjacent positions. The difference between the two observations is due in part to the addition of MnCl 2 to the crystallization medium (37), with Mn 2ϩ preferentially binding to the site that includes a histidine ligand. This was deliberate and followed the observation of strong thermal stabilization and enhancement of endonuclease activity in the presence of Mn 2ϩ or Co 2ϩ ; both the identity of these ions and the presence of two metal-binding sites accord with a cooperative two-metal ion mechanism measured on purified RNPs (42).
PB1 Subunit-In contrast to the progress on PA and PB2, PB1 remains poorly structurally characterized. The central location of the polymerase domain is predicted from the presence of conserved motifs characteristic of segmented negativestrand RNA-dependent polymerases (43,44) and would suggest that it has a classic polymerase fold. However, expression of the full-length subunit has not yielded soluble material, and the domain-by-domain approach used successfully for the other subunits has not yet led to expression of an isolated polymerase domain at levels compatible with structural studies. Previous studies demonstrated that the N-terminal region of PB1 bound the C terminus of PA (35,45,46), and this was borne out in the PA C-terminal domain structures, where short N-terminal PB1 peptides were co-crystallized (Fig. 2) (33, 34).  At the other end of PB1, the interaction of its C-terminal region with the N terminus of PB2 has also been described biochemically (45,47). A recently determined x-ray structure of an 86-amino acid (aa) C-terminal fragment of PB1 with a 37-aa N-terminal peptide from PB2 showed how these ␣-helical subunit termini tightly co-fold (Fig. 2) (48). Despite its small size, this interaction interface is completely conserved across different avian and human influenza strains, and the absolute requirement of the 250-kDa trimer in this interaction for function suggests that this interaction, like that of PB1 and PA, could be a possible drug target.
PB2 Subunit-The PB2 subunit was initially identified as the site of cap binding through cross-linking studies (49,50), with further experiments suggesting regions 242-252 and 533-577 as interaction sites (39,51). Mutagenesis later confirmed this role of the PB2 subunit but isolated it to central residues Phe-363 and Phe-404 (52), apparently contradicting the location determined from cross-linking data. Other information was available on this subunit relating to its nucleocytoplasmic trafficking that was proposed to occur via multiple nuclear localization signals both internally and C-terminally located (53). In common with PA, the protein sequence of PB2 is unlike any other, preventing the use of sequence alignments in identifying structural domains. Unlike PA, full-length PB2 cannot be expressed solubly, precluding the use of limited proteolysis. Therefore, studies on this subunit were effectively blocked until the development of ESPRIT, a robotic random library-based construct screening process (54) that was used to systematically screen almost 90,000 random fragments of the pb2 gene, identifying a series of Escherichia coli-expressible soluble fragments for structural studies (55-57).
The first fragment identified was obtained from a 5Ј-pb2 deletion library and comprised a highly soluble, overexpressing C-terminal domain (aa 678 -759) (Fig. 2) (56). Analysis of the NMR solution structure revealed a well folded domain fused to a relatively mobile region, with a sequence suggestive of a classic bipartite nuclear localization signal (NLS; 737 RKRX 12 KRIR 755 ) that was previously thought to be monopartite (53). Nuclear translocation assays with the green fluorescent protein-fused "NLS domain" (also referred to as DPDE) and full-length PB2 in transfected cells confirmed this activity. Although the NLS domain did not crystallize independently, co-crystallization with the nuclear import receptor importin ␣5 yielded an x-ray structure detailing the bipartite nature of the NLS and illustrating how the NLS region becomes fully unfurled during importin binding. The clear assignment of importin-dependent binding to the C-terminal region put into question the internal NLS (residues 449 -495) identified through gene deletion and cellular localization studies (53). Later structural data precisely defined the location and structure of the overlapping cap-binding domain (aa 318 -483; discussed below), suggesting that aberrant nuclear transfer of the ⌬449 -495 construct may have been the consequence of PB2 structural destabilization.
Also identified by ESPRIT was a soluble central region with constructs spanning aa 241-483 (Fig. 2) (55). Because this contained residues implicated in cap binding (52), the purified fragments were assayed with m 7 GTP-Sepharose, confirming an intrinsic cap-binding activity. A proteolytically stable subfragment (aa 318 -483) was observed during purification, and it was this fragment that ultimately crystallized in the presence of m 7 GTP. The resulting structure exhibited a new fold, consistent with the absence of homology to other proteins (Fig. 3b). Despite this, the methylated guanine base bound in a mode that is commonly observed in cap-binding proteins (e.g. cap-binding complex (58) and eIF4E (59)) whereby aromatic side chains sandwich the positively charged aromatic ring of the base. The interacting residues are completely conserved in all influenza strains, and their mutation abolished cap-dependent transcription but not cap-independent replication in recombinant mini-RNPs (55). The x-ray structures and availability of active E. coliexpressible PB2 cap-binding and PA endonuclease domains open the way to the development of drugs that target the endonuclease activity (7), a strategy that was pursued previously by pharmaceutical companies (41,60) but that proved unsuccessful due to the need to work without structures on low abundance whole RNPs purified from cells.

Host Adaptation and Role of Polymerase Subunits, Notably PB2
The World Health Organization maintains a global surveillance program that generates large amounts of sequence data from wild-type viral isolates (61), made accessible via databases at the National Center for Biotechnology Information (NCBI) and the Global Initiative on Sharing All Influenza Data (GISAID) (62,63). Comparative sequence analyses have identified numerous variations, among which must be those responsible for antigenic drift, shift, and host adaptation. The major roles of HA and NA in host specificity are well established, and the x-ray structures of these proteins have contributed a valuable molecular level understanding of their variants (4). Several studies have also identified host signatures in polymerase subunits that are correlated with adaptation of avian viruses to mammalian hosts (Refs. 11 and 64 -67; reviewed in Ref. 68). Although sequence analyses highlight host-related differences, it is not always clear which are responsible for the host shift event rather than being the result of neutral drift (i.e. are passenger mutations alongside those that are functional), hence the importance of experimental testing of hypotheses. One recent computational study (67) used a statistical method that compensated for biases from neutral drift and the underlying phylogenetic relationships between sequences in the dataset. In common with previous observations (68), putative host-adaptive mutations were found with high confidence in the polymerase subunits: 2 in PA, 3 in PB1, and 13 in PB2. The abundance of such mutations in the PB2 subunit was also predicted previously (64 -66, 69). For most of these mutations, little is known about their mode of action. A notable exception is PB2 627, which is almost invariably Glu in avian viruses and Lys in human viruses. A single E627K mutation in an otherwise non-human-infective avian virus is sufficient to confer host adaptation (66,70). Many studies (reviewed in Ref. 68) have sought to explain this observation, with a major hypothesis being that the E627K mutation confers improved replication in mammalian cells, particularly at lower temperatures (71), in accordance with the mammalian upper respiratory tract being significantly cooler (33-35°C) than the avian intestinal tract (37-40°C). There is currently little understanding of the mechanism by which E627K exerts its effect, although it appears to affect the PB2-NP interaction in a host cell-dependent manner (14,15,17).
A systematic E. coli expression screen of the PB2 subunit using the ESPRIT method resulted in isolation of a large C-terminal soluble fragment containing 8 of the 13 predicted host determinant residues (67) and the previously characterized NLS domain (Fig. 2) (56). This crystallized to reveal two domains packed tightly together via a polar interface (Fig. 4a) (57). A very similar structure, also from a human virus, was solved by a second group (72). The upstream domain, termed the 627 domain because it contains this major host determinant residue, was also crystallized, revealing the side chain to be solvent-exposed in both Lys and Glu forms (Fig. 4b) (57). The 627 domain has a highly basic surface patch that was disrupted in an avian-like glutamic acid mutant (57). The function of this 627-NLS double domain remains unclear despite these structures, although a possible role in RNA binding has been proposed (72). Interestingly, the pandemic H1N1 virus has Glu-627 but is highly transmissible between humans. Using the crystal structures of the 627 domain, this was explained by a compensating double mutation, G590S and Q591R, in which Arg-591 effectively shielded the negative charge of Glu-627, re-establishing the basic surface patch that seems to characterize human-infective viruses (73). Further mutation at position 627 of this pandemic virus has not occurred to any serious degree, and a laboratory-engineered variant containing both G590S and Q591R and Lys-627 showed no obvious replicative advantage, suggesting that little further selective advantage to the virus is provided (74).
A good correlation is observed between the putative hostadaptive mutations identified from sequence analysis studies of PA and PB2 and their location on the crystal structures; most are solvent-exposed surface residues (Fig. 4). This is consistent with the hypothesis that such mutations could mediate interactions with host cell factors necessary for more efficient polymerase activity or disrupt those with host restricting factors of the innate immune system. Such has been suggested for PB2, where replication of polymerases with avian Glu-627 is inhibited by a host restricting factor present in human cells (17), although it has not yet been identified. The interaction with NP is also implicated (14,15,17), but it remains to be seen if it is NP itself that is the 627-dependent interactor or a host molecule.
A number of interactions between RNPs and specific host factors have been described (reviewed in Ref. 68), although it is not known if they are involved in host range determination. For example, the transcription activator hCLE (75) and the RNA polymerase II C-terminal domain (76) have been demonstrated to bind the polymerase. Perhaps best understood is the association of PB2 with members of the nuclear import receptor family, importin ␣. The co-crystal structure of the PB2 C-terminal NLS domain with human importin ␣5, together with nuclear transport assays, revealed the mechanism by which this subunit was imported into the nucleus (56). PB2 also binds importin ␣1 in vitro (57) and importin ␣7 in human cells (77), the latter report suggesting an additional function of importins as necessary cofactors for replication activity. Previously, the PB2 mutation D701N (now known to be located in this domain) had been shown to facilitate adaptation of an avian virus to mice (78). This effect is apparently explained by the observation that D701N enhances binding to importin ␣1 in mammalian cells but not avian cells, with a resultant increase in PB2 nuclear accumulation (79). In vivo, D701N enhances mammalian infectivity of viruses with avian-like PB2 Glu-627, normally characteristic of low pathogenicity, both in guinea pigs (80) and in humans (81), being similar in its compensatory effect to the G590S and Q591R double mutation of the adjacent 627 domain (73). Thus, the importin-PB2 association may be one mechanism by which mutations can lead to host adaptation, by modulating either nuclear localization or polymerase activity in a host-specific manner.
In recent years, high throughput proteomic approaches and yeast two-hybrid (Y2H) and genome-wide RNA interference (RNAi) screens have increased manyfold the list of putative polymerase partners, most of which still need to be properly validated. Tandem affinity purification (TAP) strategies using polymerase (82,83) and RNPs (83) have been employed to isolate physically interacting host factors from cells, identifying various heat shock proteins and nuclear import factors. To a similar end, individual polymerase subunit baits were screened against a high quality human ORFeome library by Y2H, resulting in the description of a wide and diverse network of interactors (84). Despite screening for direct association, many interactions identified from TAP tag and Y2H experiments are difficult to rationalize from a biological perspective. Genomewide RNAi screens in Drosophila (85) and human (86 -88) cell lines have not aimed to identify direct influenza protein inter- actors per se but have revealed hundreds of host cell proteins whose presence appears to be essential for virus replication. Some overlap is observed between the host factor lists from these studies, e.g. the V-ATPase ATP6AP1 and COPI vesicle transport proteins are found in all RNAi screens using human cell lines, but there is also a significant variation between the different datasets (reviewed in Ref. 89). Despite the limited agreement between the proteins identified in these high throughput experiments, it may be that a subset of the binders identified in the interaction screens (TAP tag and Y2H) or the virus-associated host factors from the genome-wide RNAi screens might bind directly to the polymerase subunits in a manner modulated by their surface-exposed host-adaptive residues. The availability of well behaved, purifiable domains and their structures should help validate these interactions and explain at a molecular level the effects of polymerase-induced genetic mutations on polymerase interactions in the cellular environments of different host species.