Multiple NF-κB Sites in HIV-1 Subtype C Long Terminal Repeat Confer Superior Magnitude of Transcription and Thereby the Enhanced Viral Predominance*

Background: Viral evolution of HIV-1 is dynamic and moving towards a higher order of replicative fitness. Results: HIV-1 subtype C acquires an extra (4th) NF-κB site to achieve a higher degree of transcription and in turn enhances its replicative fitness and preponderance. Conclusion: Subtype C with an extra NF-κB site adopts a novel strategy of strengthening its promoter to gain fitness. Significance: Learning how the new strains could impact viral prevalence, pathogenesis, and disease management strategies is critical. We demonstrate that at least three different promoter variant strains of HIV-1 subtype C have been gradually expanding and replacing the standard subtype C viruses in India, and possibly in South Africa and other global regions, over the past decade. The new viral strains contain an additional NF-κB, NF-κB-like, or RBEIII site in the viral promoter. Although the acquisition of an additional RBEIII site is a property shared by all the HIV-1 subtypes, acquiring an additional NF-κB site remains an exclusive property of subtype C. The acquired κB site is genetically distinct, binds the p50-p65 heterodimer, and strengthens the viral promoter at the levels of transcription initiation and elongation. The 4-κB viruses dominate the 3-κB “isogenic” viral strains in pairwise competition assays in T-cell lines, primary cells, and the ecotropic human immunodeficiency virus mouse model. The dominance of the 4-κB viral strains is also evident in the natural context when the subjects are coinfected with κB-variant viral strains. The mean plasma viral loads, but not CD4 counts, are significantly different in 4-κB infection suggesting that these newly emerging strains are probably more infectious. It is possible that higher plasma viral loads underlie selective transmission of the 4-κB viral strains. Several publications previously reported duplication or deletion of diverse transcription factor-binding sites in the viral promoter. Unlike previous reports, our study provides experimental evidence that the new viral strains gained a potential selective advantage as a consequence of the acquired transcription factor-binding sites and importantly that these strains have been expanding at the population level.

duction precedes that of other subtypes. Second, subtype C appears to rapidly replace or dominate "founder" viral strains where its introduction follows that of other subtypes as is manifested in its competition against several other subtypes in the Democratic Republic of Congo, Tanzania, and South Africa; against subtype B in the South American continent, especially in southern Brazil; and against subtypes B and CRF01_AE in China. Finally, since the time of its origin, subtype C has expanded at a faster rate than any other viral subtype. For instance, in southern Brazil, the incidence of subtype C increased from 3% in 1990s (3) to 30% in 2002 (4) and eventually to 45-48% in recent years (5). Furthermore, since the time of their introduction in India, the subtype C strains of India appear to remain genetically stable without undergoing a major genetic recombination with other subtypes, unlike that seen in China or southern Brazil, even though several circulating recombinant forms have been reported from India (6).
The regulatory elements of viruses could play an important role in conferring differences in replication fitness as observed previously in the HIV-1 subtypes. The long terminal repeat sequences of the viral subtypes are highly diverse, differing up to 20 -25% between subtypes (7). A comparison of subtype C LTR (C-LTR) 3 with that of others identified several distinct differences in the composition of the TFBS, including NF-B (8,9), NFAT, upstream stimulatory factor (10), and other regulatory elements such as the TATA box, and the TAR region (11)(12)(13). Of these variations in the LTR, subtype-specific patterns within the enhancer element, exclusively consisting of the NF-B motifs, are important given the profound impact NF-B has on gene expression regulation from the viral promoter. The enhancer in most of the viral subtypes, including the prototype subtype B virus, consists of two identical and canonical B motifs with the exception of subtypes A/E and C. While in subtype A/E, the upstream NF-B site is replaced with a GA-binding protein-binding motif (14), the genetic variation within the subtype C enhancer is more complex. Although a large proportion of C-LTRs contains three NF-B sites (8,15,16), a small minority of these LTRs contains a fourth motif that is either a canonical B site or B-like site (8,(17)(18)(19)(20). Additionally, C-LTRs with only two B sites have also been reported (16,21). C-LTR containing three NF-B sites can demonstrate stronger transactivation activity compared with the LTRs of other subtypes containing two NF-B sites (10,12). Thus, unlike other HIV-1 subtypes, the C-LTR has a significant variation of B site number and sequence changes within these sites.
In subtype B, immediately upstream of the viral enhancer element, an insertion of unique sequences 15-34 bp in length was reported in ϳ38% of isolates in 1990s (22) or in at least 14.2% of the viral isolates in our present analysis; this finding adds an additional level of complexity to the genetic diversity of this important regulatory region. These insertions commonly known as the most frequent naturally occurring length polymorphism (MFNLP), predominantly generate an RBEIII motif, a binding site for the RBF2 transcription factor (23). It was proposed that MFNLP is a compensatory mechanism to ensure the presence of at least one functional RBEIII site in the LTR (24). Although a small number of such sequences have been deposited into the databases, MFNLP insertions have not been reported previously for the subtype C LTR.
The divergence of HIV-1 into several genetic subtypes, the uneven distribution of these subtypes across the globe, and the evident domination of certain subtypes, especially subtype C, over others raise two important questions. First, which molecular properties of the viral subtypes may underlie the wide range of biological differences observed? Second, are the HIV-1 subtypes likely to undergo additional evolutionary modifications in the future?
In a previous screening, from a total of 607 primary clinical viral isolates collected at several clinics in southern India in 2000 -2003, 34 viral strains containing sequence insertions in the LTR were identified (25). A subsequent sequence analysis of the viral enhancer in 25 of the 34 viruses demonstrated that the sequence insertions in C-LTR acquired at least two different types of TFBS, predominantly NF-B (12 of 25) and in a minority of the strains the RBEIII site (6 of 25) (26). Additionally, a few viral strains (4 of 25) contained sequence insertions that broadly resembled a canonical NF-B motif hence referred to here as the B-like sites. Two of the 25 sequences contained the insertion of both of the RBEIII and the B-like motifs. A single viral strain contained a dual insertion of a canonical NF-B site and an RBEIII-like site. Our previous analysis thus identified five different types of sequence insertions in the subtype C viral promoter. It is, however, not known whether any of the promoter variant viral strains is endowed with altered infectivity and/or pathogenic properties.
Here, we demonstrate that HIV-1 subtype C appears to be undergoing further modifications in the viral promoter by the acquisition of additional TFBS in the enhancer region, possibly gaining higher levels of replication fitness. Importantly, no associated variations have been found in any of the viral gene products consisting of all of the structural, regulatory, and accessory proteins. At least three different viral promoter variant strains of subtype C have emerged in the early 1980s in India and South Africa independently (data not shown), and possibly at other geographical locations, and they appear to have been replacing the standard subtype C strains at a substantial pace. Among these, the viral strains containing four B-binding sites in the enhancer demonstrate the fastest expansion in India. The emergence of subtype C viral variants containing four B sites is not merely of academic interest but may have implications for viral fitness, pathogenesis, and overall viral evolution considering the magnitude of viral recombination characteristics of HIV-1.

EXPERIMENTAL PROCEDURES
Ethics Statement-Ethical clearance for this study was obtained from the institutional human ethics committees at the Clinical Samples-The study participants consisted of only adult subjects, over 18 years of age, representing both the genders (83/214 females, 39%), believed to have acquired the infection primarily through heterosexual transmission, and a large majority was drug-naive (166/214, 78%) (supplemental Table  1). A single vial of 6 -8 ml of peripheral blood was collected after obtaining informed consent from each of the participants. The samples from St. John's Hospital (n ϭ 44) and Freedom Foundation (n ϭ 30), both based in Bengaluru, the YRG Centre for AIDS Research and Education, Chennai (n ϭ 55), and the All India Institute of Medical Sciences, New Delhi (n ϭ 30), were collected during 2010 -2011. For the southern Indian cohort, a total of 607 samples was collected during 2000 -2003 from multiple urban centers (25). Samples for the Jawaharlal Nehru Centre for Advanced Scientific Research cohort consisting of 57 volunteers were collected between 2006 and 2007 (27).
Amplification, Cloning, and Bioinformatics Analysis of the LTR Sequences-The amplification and cloning of the LTR sequences from the patient isolates were performed as we described previously (26). Briefly, full-length or the U3 region of the viral LTR was amplified from the proviral DNA or plasma viral RNA, respectively. The full-length LTR sequences amplified from the proviral DNA were directionally cloned between the restriction sites MluI and EcoRI into the reporter vector pcDNA3.1(ϩ) luciferase-IRES-EGFP, thus substituting the CMV promoter with the amplified LTR. Sequence information for each of the viral promoters was obtained in both the directions using multiple plasmid clones for each of the clinical samples. Primers N1007 (5Ј-TTAAGCTACAAGGCAAGGC-3Ј) and N1009 (5Ј-GTTGTTCTCGGTGGGCTTGG-3Ј) were used for sequence determination of the LTR insert. Multiple recombinant clones were sequenced for each subject. Alternatively, the viral RNA was extracted from the frozen plasma samples using a commercial kit (catalog no. 740956, Nuclisense viral RNA isolation kit, Machery-Nagel). First-strand cDNA was synthesized using random hexamers and Superscript II reverse transcriptase (catalog no. 18064-022, Invitrogen). A 400-bp fragment in the U3 region of the LTR was amplified using the primers N1031 (5Ј-GCTTCTTTTTAAAA-GAAAAGGGGGGACTGGA-3Ј) and N1024 (5Ј-TGTA-CTGGGTCTCTCTAGGTAGA-3Ј). The PCR products were purified and sequenced directly using the primer N698. DNA sequencing was performed on the ABI 377 automated sequencer using the ABI PRISM TM dye terminator cycle sequencing ready reaction kit. Every individual LTR sequence was subjected to the BLAST analysis against the global and laboratory sequence database to confirm authenticity. The sequences were deposited to GenBank TM sequence database and the information related to the patient details and GenBank TM accession numbers is provided in supplemental Table 1. Of a total of 3,054 full-length LTR sequences belonging to all HIV-1 genetic subtypes either downloaded from the databases or generated through this work, 479 sequences contained insertions upstream from the viral enhancer. Of the total 1,113 sequences in subtype C, 213 sequences contained such insertions. The multiple sequence alignment was performed using ClustalW in the BioEdit software. The transcription factorbinding sites in the LTR sequences were identified using the TF Search program.
Cell Culture-The Jurkat and CEM-CCR5 were cultured in RPMI 1640 medium (catalog no. R0883, Sigma) supplemented with 10% fetal bovine serum (catalog no. 04-222-1, Biological Industries, Kibbutz Beit Haemek, Israel), 2 mM glutamine, 100 units/ml penicillin G, and 100 g/ml streptomycin. The peripheral blood mononuclear cells (PBMC) were purified from 10 ml of fresh blood from healthy donors by density gradient centrifugation using Ficoll-Hypaque (catalog no. 10771, Sigma). The CD8 cells were depleted from the PBMC using the StemSep Human CD8 ϩ depletion kit (catalog no. 14662, Stem Cell Technologies, Vancouver, Canada). PBMCs were cultured in complete RPMI medium supplemented with 10 units/ml of interleukin-2 (IL-2, catalog no. 136, AIDS Reagent Program, National Institutes of Health) and 5 g/ml of phytohemagglutinin P (catalog no. L9132, Sigma) for 3 days. PBMC were subsequently used for viral infection in a medium supplemented with only IL-2. The human embryonic kidney HEK293, 293T EL4, and TZM-bl cells were grown in Dulbecco's modified Eagle's medium (catalog no. D5546, Sigma) supplemented with 10% FBS.
Electrophoretic Mobility Shift Assay and the Supershift Assay-For preparation of the nuclear extract, Jurkat cells (50 ϫ 10 6 ) were suspended in 1 ml of ice-cold 1ϫ phosphate-buffered saline (PBS: 137 mM NaCl, 2.7 mM KCl, 4.3 mM Na 2 HPO 4 , 1.47 mM KH 2 PO 4 , pH 7.4) in a 1.5-ml plastic vial and centrifuged at 500 ϫ g for 5 min at 4°C. PBS was aspirated, and the cells were resuspended in 500 l of the sucrose buffer (0.32 M sucrose, 10 mM Tris-HCl, pH 8.0, 3 mM CaCl 2 , 2 mM MgOAc, 0.1 mM EDTA, 0.5% Nonidet P-40, 1 mM dithiothreitol (DTT), and 0.5 mM phenylmethanesulfonyl fluoride (PMSF)). Using a 1-ml wide bore plastic tip, cells were mixed gently by pipetting the suspension up and down several times. The cell suspension was centrifuged at 500 ϫ g for 5 min at 4°C, and the supernatant was aspirated. The nuclear pellet was washed once with 1 ml of cold sucrose buffer devoid of Nonidet P-40. Using a 1-ml wide bore plastic tip, the nuclear pellet was gently pipetted up and down to disrupt it. The nuclei were resuspended in 150 l of the low salt buffer (20 mM HEPES, pH 7.9, 1.5 mM MgCl 2 , 20 mM KCl, 0.2 mM EDTA, 25% glycerol, 0.5 mM DTT, and 0.5 mM PMSF) and subjected to gentle vortexing. Subsequently, an equal volume of the high salt buffer (20 mM HEPES, pH 7.9, 1.5 mM MgCl 2 , 800 mM KCl, 0.2 mM EDTA, 25% glycerol, 1% Nonidet P-40, 0.5 mM DTT, 0.5 mM PMSF, 4.0 g/ml each of leupeptin, aprotinin, and pepstatin) was added slowly drop by drop while mixing the contents with a plastic tip. The samples were then incubated for 30 -45 min at 4°C with gentle mixing on a rotator. The samples were centrifuged at 14,000 ϫ g for 15 min at 4°C. The protein content of the extracts was determined using a commercial BCA protein assay kit (catalog no. 23227, Pierce). The nuclear extract was snap-frozen in liquid nitrogen as 50-l aliquots and stored in a deep freezer until use.
Gel Shift Assays-For the electrophoretic mobility shift assays (EMSA), 10 g of nuclear extracts were mixed with 30,000 cpm of the labeled probe in the binding buffer (10 mM HEPES, pH 7.9, 4% glycerol, 25 mM KCl, 1 mM DTT, 0.5 mM EDTA, and 25 mM NaCl) and in the presence of poly(dI-dC) (1 g/ml) (catalog no. P4929-5UN, Sigma), BSA (1 g/ml). The sample was incubated for 20 min at room temperature and resolved on a 6% polyacrylamide gel at 150 V in a cold room. Competition experiments were performed with 100-fold excess of unlabeled oligonucleotides. For the supershift assay, nuclear extracts were preincubated with 1 g of affinity-purified rabbit polyclonal antibodies raised against p50, p65, c-Rel, p52, or RelB proteins. After 30 min of incubation, radiolabeled probes were added, and the electrophoresis was performed as described above. Gels were dried and exposed to a Kodak Biomax film at Ϫ85°C. Anti-Rel antibodies were raised against each of the five Rel family members using synthetic peptides conjugated to the keyhole limpet hemocyanin carrier in New Zealand White rabbits. Antigen-specific antibodies were purified on immunoaffinity columns using peptide affinity chromatography. The peptide sequences used for raising the antibodies are as follows: p50 (NH 2 -YNPGLDGIIEYDDFKLNSSIC-COOH), p65 (NH 2 -QASALAPAPPQVLPQAPAPAC-COOH), c-Rel (NH 2 -FQVFLPDEHGNLTTALPPVVC-COOH), p52 (NH 2 -AEDDPYLGRPEQMFHLDPSLC-COOH), and RelB (NH 2 -NHGSGPFLPPSALLPDPDFFSGTVSC-COOH). The specificity of the affinity-purified antibodies was confirmed using Western blot and EMSA analyses.
Protein Expression and Purification-The prokaryotic expression plasmid pGEX CD-p50 was a generous gift from Dr. Neil D. Perkins (University of Dundee, Scotland, UK). The expression and purification of full-length p50-GST fusion protein were accomplished as described (29). Briefly, p50-GST protein was expressed in Escherichia coli BL21 (DE3) cells. The cells were grown in the LB medium supplemented with 100 g/ml ampicillin for 3-5 h at room temperature until the cells reached an optical density of 0.5 at 600 nM and then induced for 1 h with 1 mM of isopropyl 1-thio-␤-D-galactopyranoside. The induced cells were pelleted, washed, and resuspended in GSTbinding buffer (20 mM HEPES, pH 7.5, 10 mM MgCl 2 , 20% glycerol, 0.1% Nonidet P-40, 1 mM DTT, protease inhibitor mixture (catalog no. P8465, Sigma), and 0.5 mM PMSF). The bacterial cell suspension was sonicated in a Sonics Vibra Cell sonicator at amplitude of 40% with a 10-s pulse alternating with 10 s of resting on ice for five cycles. The lysate was centrifuged at 45,000 ϫ g for 30 min to remove the cell debris. The supernatant from the lysate was mixed with 500 l of glutathione-Sepharose 4B beads (catalog no. 17-0756-01, GE Healthcare) preincubated with GST-binding buffer (a 50:50 slurry of glutathione-Sepharose 4B beads in GST-binding buffer). The lysate along with the beads were allowed to tumble on a rotator for 2 h at 4°C. The suspension was centrifuged at 100 ϫ g, and the supernatant was carefully aspirated without disturbing the beads. The beads were washed three times with 20 ml of GSTbinding buffer each time, and the slurry was transferred to a fresh 1.5-ml plastic vial. The elution was performed using 500 l of GST elution buffer (10 mM reduced glutathione, 50 mM Tris-HCl, pH 8.0, and 5% glycerol). The beads were incubated with the elution buffer for 15 min on ice; the vial was centrifuged at 100 ϫ g, and the supernatant was transferred to a fresh vial. The supernatant was subjected to dialysis in ITC buffer (10 mM sodium phosphate, pH 7.0, 150 mM NaCl, 1 mM EDTA, and 5 mM DTT) overnight at 4°C with three changes of the ITC buffer. Following dialysis, the protein concentration of the sample was determined using a commercial kit (catalog no. 23225, BCA protein assay kit, Piercenet), and the sample was stored in aliquots at Ϫ20°C until required.
ITC-The ITC measurements were recorded using the ITC200 instrument (MicroCal, Inc.). Titrations were performed at 25°C in ITC buffer (10 mM sodium phosphate, 150 mM NaCl, 1 mM Na 2 EDTA, and 5 mM DTT at pH 7.0). A microcuvette was loaded with 240 l of 10 M of the recombinant p50 protein GST-tagged, and 2-l aliquots of the H-B, F-B, or mutant double-stranded oligonucleotides, at a concentration of 100 M, were injected into the cell using a syringe. The protein solutions were dialyzed against the ITC buffer prior to the assay to minimize the contribution of the dilution to the binding heat. Injections were made at an interval of 120 s, and the duration of each injection was 0.4 s. A stirring speed of 1,000 rpm was maintained during the assay to ensure proper mixing after each injection. The heat change versus the molar ratio of the titrated products was plotted and analyzed using the manufacturer's software. Origin software within the ITC data analysis package was used for the data analysis.
Chromatin Immunoprecipitation Assay (ChIP)-Jurkat cells (5 ϫ 10 7 ) were infected with 5,000 infectious units of VSV-G pseudotyped pcLGIT viruses and harvested at 72 h. Prior to harvesting, cells were treated in the absence or presence of tumor necrosis factor-␣ (TNF-␣, 20 ng/ml) for 1 h. The cells were resuspended in 10 ml of 1ϫ PBS, and a formaldehyde stock solution of 37% was added to the suspension dropwise and with continuous mixing to a final concentration of 1%. After 10 min of incubation at room temperature, the crosslinking reaction was stopped by adding a glycine stock solution of 2.5 M to a final concentration of 0.125 M. The cells were centrifuged at 2,000 rpm for 5 min at 4°C; the supernatant was aspirated, and the cells were washed once using cold PBS. The cells placed on ice were resuspended in 0.5 ml of the lysis buffer (5 mM PIPES, pH 8.0, 85 mM KCl, 0.5% Nonidet P-40). Protease Inhibitor Mixture (catalog no. 11836170001, Roche Applied Science) was added to the buffer just prior to use. The tube was centrifuged at 2,000 rpm at 4°C for 5 min, and the crude nuclear fraction was resuspended in 500 l of the radioimmunoprecipitation assay buffer supplemented with the protease inhibitors. The nuclear suspension, in 100-l fractions, was subjected to sonication using the Sonics Vibra cell sonicator for 10 cycles, at 10% amplitude, using the 15 s on and 15 s off pulses for the duration of 5 min with 1-min intervals. DNA was sheared to achieve an average size of 200 -700 bp, as confirmed in electrophoresis. The sonicated mixture was centrifuged at 14,000 rpm for 15 min at 4°C, and the supernatant was collected. Immunoprecipitation was performed using 5 g each of anti-p50 (sc-8414X, Santa Cruz Biotechnology), anti-p65 (sc-8008X), and IgG (as negative control, generated in-house) polyclonal antibodies. A 240-bp region flanking the B and Sp1 elements in the LTR was amplified from the immunoprecipitated DNA using the primer pair N1054 (5Ј-GAAGTATTAA-AGTGGAAGTTTGACATTC-3Ј) and N1056 (5Ј-AGAGAC-CCAGTACAGGCGAAAAGC-3Ј). A 300-bp PCR fragment targeting the cellular IB-␣ promoter (N1057 5Ј-GACG-ACCCCAATTCAAATCG-3Ј and N1058 5Ј-TCAGGCT-CGGGGAATTTCC-3Ј) was used as an internal control for NF-B recruitment in the assay. Chromatin isolated from Jurkat cells in the absence of the retroviral infection served as a negative control (data not shown).
We performed the ChIP analyses for RNA polymerase II and its phosphorylated form to confirm the recruitment of the transcription-competent complexes to the viral promoters. Immunoprecipitation was performed using 5 g each of the antibodies specific to the unphosphorylated RNA polymerase II C-terminal domain (CTD) repeat (YSPTSPS, ab5408, Abcam, Cambridge, UK) and Ser-2-phosphorylated RNA polymerase II CTD repeat (phospho-Ser-2, ab5095). An IgG1 antibody (PP64B, Upstate) was used as an isotype-negative control. The presence of the unphosphorylated form of RNA polymerase II on the viral promoter was confirmed using the primer pair N1054 and N1056. The presence of the Ser-2-phosphorylated form of RNA polymerase II was confirmed ϳ3,000 bp from the transcription start site (TSS) by amplifying a 300-bp PCR frag-ment in the Tat-coding sequence using primers N1140 (5Ј-TCCAGTCCACAACCATGGATGGAGCCAGTAGATCCT-AAC-3Ј) and N1141 (5Ј-GGGCCCCTCGAGCTAAGTCGA-AGGGGTCTGTCTC-3Ј). A 173-bp fragment of the cellular GAPDH promoter was amplified using primers N2014 (5Ј-TACTAGCGGTTTTACGGGCG-3Ј) and N2015 (5Ј-TCGAA-CAGGAGGAGCGAGAGCGA-3Ј) as an internal control for Ser-2 phosphorylation of RNA polymerase II. The recruitment of the NFAT transcription factors to the viral promoters was confirmed using the primer pair N1054 and N1056 and 5 g each of anti-NFAT1 (ab2722), anti-NFAT2 (ab2796), or an IgG isotype control antibodies (PP64B, Upstate) for immunoprecipitation. As a positive control for NFAT recruitment, a 147-bp fragment of the cellular TNF-␣ promoter containing an NFAT-binding site was amplified using primers N2016 (5Ј-AGGATGGGGAGTGTGAGGG-3Ј) and N2017 (5Ј-CCT-TGGTGGAGAAACCCATGAGCTCATCT-3Ј) (30).
Construction of the Reporter Vectors-A dual-reporter vector pLTR-sLuc-IRES-EGFP that simultaneously expresses two different reporter genes, secreted Gaussia luciferase (sLuc) and enhanced green fluorescent protein (EGFP), was constructed as follows. Gaussia luciferase was amplified with primers N712 (5Ј-CCAGCCGAATTCACCATGGGAGTCAAAGTTCTG-3Ј) and N713 (5Ј-GGCCGCGGATCCTTAGTCACCACCGGCCCC-CTT-3Ј) from pCMV-Gluc (Nanolights, NanoLight Technologies, Panama) and cloned directionally into pIRES-EGFP (BD Biosciences) between the EcoRI and BamHI sites. The luciferase-IRES-EGFP cassette was moved to the pcDNA 3.1(ϩ) vector directionally using the NotI and EcoRI sites. The CMV promoter was later replaced by the full-length HIV-1 subtype C LTR sequence, consisting of 656 bp, amplified from a patient BL42 (Gen Bank TM accession number HQ202921) using primers N698 and N854, between the restriction sites MluI and EcoRI.
One group of the promoter variant viral strains of subtype C of India contains four functional NF-B-binding sites in the viral enhancer referred to here as the B viral strains as opposed to the standard subtype C viral strains that contain only three such binding sites, designated here as the 3-B strains. Furthermore, the four NF-B-binding sites in the subtype C viral promoter fall into three genetically distinct types, and for the sake of clarity we label them accordingly. The two genetically identical canonical B sites (5Ј-GGGACTTTCC-3Ј) found in all the HIV-1 strains are designated here as "H-B" sites. The Sp1proximal variant B site (5Ј-GGGGCGTTCC-3Ј) unique for subtype C alone, and not found in any other HIV or simian immunodeficiency virus, is designated here as "C-B" site. The fourth variant B site (5Ј-GGGACTTTCT-3Ј) found further upstream and inserted through the sequence duplication and the focus of this work is designated here as the "F-B" (F for fourth) site. The LTR from the viral isolate BL42 was selected here as it represented all the subtype-specific molecular properties, including the presence of the F-B site. A panel of isogenic variant LTR reporter vectors was generated from this parental vector using the overlap PCR strategy. The 22-bp tandem duplicated sequence, consisting of the F-B site, was deleted from the native viral promoter (FHHC-LTR, see under "Results") to generate the HHC-LTR that represented the standard configuration of the subtype C LTR. Additionally, two different viral LTRs that contained H-or F-B sites at all four locations (HHHH and FFFF) were also constructed to examine the transactivation properties from the viral enhancers homogeneous for the B sites. In all the above 4-B LTRs, the sequence context flanking the individual NF-B-binding sites is identical to that of the original viral promoter. A null vector devoid of all the NF-B-binding sites was also constructed. Furthermore, for comparison, two additional reporter vectors were made containing viral promoters derived from NL4-3 (subtype B, accession number M19921) or Indie C1 (subtype C, accession number AB023804) molecular clones.
Analysis of the Reporter Gene Expression-Jurkat cells were transiently transfected with the Lipofectamine 2000 transfection reagent (catalog no. 11668-019, Invitrogen). The cells were seeded into 48-well clusters at a density of 5 ϫ 10 5 cells/well in 400 l of RPMI 1640 medium supplemented with 10% fetal calf serum. A plasmid DNA pool of 500 ng, containing 300 ng of one of the reporter plasmids, 100 ng of the pGL3 plasmid expressing Firefly luciferase (catalog no. E1741, Promega), and 100 ng of the subtype C Tat-expression vector was prepared in 50 l of serum-free RPMI medium. For the Tat-minus transfection, 100 ng of pcDNA3.1(ϩ) plasmid was used as carrier DNA. One l of Lipofectamine was mixed with 49 l of serum-free RPMI medium to prepare the lipid transfection reagent, which in turn was mixed with 50 l of the plasmid pool. The plasmid/lipid mixture was incubated for 20 min at room temperature and then added to appropriate wells. Twelve hours following the transfection, the cells were washed to remove the lipid complexes and resuspended in 500 l of the complete RPMI medium. For cell activation, transiently transfected Jurkat cells in a volume of 500 l were incubated in the absence or presence of one or a combination of the activators as follows: TNF-␣ (20 ng/ml; catalog no. 210-TA-010, R&D Systems, Minneapolis, MN), phorbol 12-myristate 13-acetate (PMA, 20 ng/ml, catalog no. 8139, Sigma), PHA (5 g/ml, Sigma, catalog no. L9132), or anti-CD3 plus anti-CD28 antibodies (at 1:1 bead to cell ratio, catalog no. 111.31D, Invitrogen). BioLux Gaussia luciferase assay kit (catalog no. E3300L, New England Biolabs) was used to monitor the levels of the Gaussia luciferase secreted into the culture supernatant at 24 h. The luciferase assay was performed using a SpectraMax L luminescence 96-well microplate reader (MDS Inc., model s/n Lu 03094, Sunnyvale, CA) by mixing 25 l of the culture supernatant and an equal volume of the 1ϫ Bio-Lux GLuc substrate reagent. The experiments were performed in triplicate wells, and every experiment was repeated at least two times. Transfection efficiency was monitored by measuring the expression of the Firefly luciferase in the cell extracts using a commercial kit (Bright-Glo luciferase assay kit, catalog no. E2620, Promega Corp.) as per the manufacturer's instructions. The primary data were normalized for the transfection efficiency.
Construction of the Viral Molecular Clones "Isogenic" for the F-B Site in the LTR-The viral molecular clone Indie-C1 was used for the construction of the "isogenic" HHC and FHHC paired viruses. Except for the 22-bp difference in the viral promoter, the paired viruses contain identical genetic context. Such viral pairs are referred to as an isogenic viral pair for the sake of convenience. Throughout this report, we use the expression isogenic to represent viral promoters or infectious viral clones that are genetically identical with the difference of containing or not containing the 22 residues constituting the F-B motif in the viral enhancer. Using overlap PCR, the original LTR at the 3Ј end of Indie was replaced with the LTR derived from the primary viral isolate BL42 in three successive steps. This procedure also introduced an MluI site immediately upstream of the 3Ј-LTR, to permit subsequent LTR exchanges. First, the BlpI fragment of 4.1 kb, spanning the base pairs 4709 and 8840, was deleted from Indie, and the backbone was selfligated to make the BlpI site unique. Second, two different PCRs were performed, one to amplify the 230-bp fragment between the unique BlpI site and up to the beginning of the 3Ј LTR and the other to amplify the BL42 LTR. The first PCR was performed using primers N1075 (5Ј-GAGAGAATGAGAC-GAGCTGAGCCAG-3Ј) and N1077 (5Ј-ACGCGTCCC-CCCTTTTCTTTTAAAAAGAAGCTG-3Ј) and the second PCR using primers N1076 (5Ј-GAAAAGGGGGGACGCGTG-GAAGGGTTAATTTACTC-3Ј) and N1078 (5Ј-AGCTCCAC-CGCGGTGGCGGCCGCAC-3Ј). The internal primers N1076 and N1077 introduced the MluI site. The two PCR products contained a 17-bp overlap between them in addition to introducing the MluI site upstream from the 3Ј LTR. A third-round overlap PCR was performed using the first-and second-round PCR products for template and the primer pairs N1075 and N1077. The product of the third-round PCR was cloned directionally between the BlpI and SacII sites of the Indie intermediate vector from step 1 above. Finally, the original 4.1-kb BlpI fragment from Indie was restored into the BlpI site, and the clones containing the insert in the correct orientation were identified using restriction analysis. As a consequence of the MluI site introduction, the recombinant clones contain an additional amino acid alanine (GCG) at position 98 in Nef. The recombinant viral clones were confirmed for p24 production in HEK293T cells and for infectivity in CEM-CCR5 cells.
Using a similar strategy, we engineered the 3Ј LTR of the EcoHIV molecular clone to generate viruses isogenic for the F-B site. The EcoHIV molecular clone is basically a subtype B virus containing an LTR derived from NL4-3 at the 3Ј end. We substituted the B-LTR at the 3Ј end with that of BL42, a subtype C viral strain, as well as engineered an MluI site immediately upstream from the 3Ј-LTR. A 1,558-bp fragment spanning the XhoI and NaeI sites was assembled in three steps using an overlap PCR. A 188-bp fragment spanning the nef-LTR junction was amplified with primers N1333 (5Ј-GAGCAGTATCTCG-AGACCTAGAAAAACATGGAGC-3Ј) and N1335 (5Ј-TTA-GCCCTTCCACGCGTCCCCCCTTTTCTTTTAAAAAGT-3Ј) to introduce the MluI site. The full-length LTR of BL42 was amplified with primers N1334 (5Ј-AAAGGGGGGACGCGT-GGAAGGGCTAATTCACTCCCA-3Ј) and N1337 (5Ј-CCT-CCTAGCTAGCCCGCGGGTGCTAGAGATTTTCCACA-CTG-3Ј). A 736-bp fragment upstream from the 3Ј-LTR on the vector backbone was amplified with primers N1336 (5Ј-AAA-TCTCTAGCACCCGCGGGCTAGCTAGGAGGTAGAGG-TTGCAG-3Ј) and N1338 (5Ј-CCGCACAGCCGGCTCTGTG-TGACTTACTCTT-3Ј). Successive overlap PCRs performed using the amplified products generated a 1,538-bp fragment that was cloned directionally between the XhoI and NaeI sites. An additional amino acid alanine was introduced into Nef at position 98 due to the engineering of the MluI site. The recombinant viral clones were confirmed for p24 production in HEK293T cells and for infectivity in EL4 cells.
Preparation of the Viral Stocks-HEK293T cells were transiently transfected with different viral molecular clones using the standard calcium phosphate protocol (31). Cells were seeded in a 90-mm dish at low confluency and transfected with 10 g of the viral plasmid DNA along with 0.2 g of the CMV-EGFP expression vector, the latter as an internal control for the transfection efficiency. Culture supernatants were harvested at 72 h, passed through a 0.22-m filter, and stored in a deep freezer in multiple 1-ml aliquots. The p24 levels of the viral stocks were evaluated using a commercial ELISA kit (catalog no. NEK050B, HIV-1 p24 ELISA kit, PerkinElmer Life Sciences). The infectious titer of the viral stocks was determined using TZM-bl cells. Briefly, 10 4 TZM-bl cells were seeded in 100 l of DMEM in a flat-bottom 96-well culture plate. After 24 h, 100 l of serially diluted viral stocks (a serial 4-fold dilution) were added to appropriate wells in complete DMEM supplemented with 10 g/ml Polybrene. Following 2 h of incubation, the medium in the wells was replaced with 200 l of complete DMEM, and the plates were incubated for 2 days at 37°C in the presence of 5% CO 2 . To examine ␤-galactosidase expression, on day 3 the culture medium in each well was replaced with 100 l of a fixing solution (1.0% formaldehyde and 0.2% glutaraldehyde in PBS), and the plates were incubated for 5 min at room temperature. The cells were washed two times with PBS; 100 l of freshly prepared ␤-galactosidase staining solution (4 mM potassium ferrocyanide, 4 mM potassium ferricyanide, 2 mM MgCl 2 , and 1 mM X-gal in PBS) was added to each well, and the plates were incubated for 2-3 h at 37°C. Plates were washed two times in PBS, and the bluestained cells were counted manually under a low resolution microscope. The infectious units of each viral stock were determined by multiplying the cell count with the dilution factor. Pseudotyped lentiviral vectors used in the ChIP analysis were packaged in HEK293T cells. Cells seeded in 90-mm culture dishes were transfected using the Ca 2ϩ -phosphate transfection protocol with a total of 20 g of plasmid DNA pool consisting of 10 g of pcLGIT reporter virus containing one of the variant LTRs at the 3Ј end (HHC, FHHC, FFFF, or HHHH), 5 g of pMDLg/pRRE, 3.5 g of pVSV-G, and 1.5 g of pRSV-Rev. The infectious units of the pseudotyped viruses were determined as mentioned above.
Heteroduplex Tracking Assay (HTA)-The LTR-HTA consisted of a nested PCR that amplified in the second round of amplification the fragments of 340 or 362 bp spanning the U3-R region, from 3-and 4-B viruses, respectively. The primer pairs N558 (5Ј-TGGAAGGGTTAATTTACTCTAAGGAAAGGA-AAGAGATCCTTG-3Ј) and N424 (5Ј-GACACCAARGAAG-CYTTAGAYAARATAGAG-3Ј) were used in the first round and N419 (5Ј-GATGGTGCTTCAAGCTAGTRCCAGTTGA-3Ј) and N1024 (5Ј-TGTACTGGGTCTCTCTAGGTAGA-3Ј) in the second round. A homologous fragment of 336 bp amplified from NL4-3 (subtype B) LTR, using the primer pair N419 and N1024, was used as a probe to form differential heterodu-plex complexes between the FHHC and HHC PCR fragments. When the heteroduplexes were resolved in a polyacrylamide gel, the identity of the two competing viral strains could be unequivocally distinguished (supplemental Fig. 4) and quantitated using a phosphorimager. The reverse primer N1024 was end-labeled using 30 Ci of [␥-32 P]dATP (catalog no. LCP-101, Board of Radiation and Isotope Technology, India) and in combination with the unlabeled N419 forward primer used for the amplification of the B-LTR fragment. The amplified PCR fragment was gel-purified and used as probe in the HTA. For the HTA, the PCR products of the clinical samples were columnpurified (catalog no. 28104, QIAquick PCR purification kit, Qiagen India, New Delhi, India), and the DNA concentration of each sample was determined using UV spectrophotometry as well as confirmed by agarose gel electrophoresis. A typical HTA reaction of a 25-l volume consisted of 100 -125 ng of the amplified DNA hybridized to ϳ1,000 cpm of the B-LTR probe in the annealing buffer (100 mM Tris-HCl, pH 7.8, 100 mM NaCl, and 2 mM EDTA). The samples were incubated at 95°C for 3 min and snap-chilled by placing the reaction vials on wet ice. Three l of HTA loading dye (50% glycerol, 0.02 M Tris-Cl, 0.5 M DTT, 0.25% bromphenol blue, 0.25% xylene cyanol) were added to each tube, and the tubes were vortexed and centrifuged at 12,000 rpm for 1 min. The entire HTA reaction mixture was applied to an 8% polyacrylamide gel of 0.75-mm thickness, and the DNA heteroduplexes were resolved in ϳ4 h at 500 V using the Protean II xi electrophoresis system (Bio-Rad). The gel was placed on Whatman chromatography paper, wrapped in plastic film, dried, and scanned using a phosphorimager (FLA-5000, Fujifilm, Japan). The band intensities were quantified using the ImageJ software. The HTA analysis was slightly modified when this technique was applied to the clinical samples. Because the inclusion of monoinfection controls was not a possible option in the natural infection, unlike in the experimental models, we directly compared the band intensities to measure viral domination.
Viral Proliferation and the Pairwise Viral Competition Assay-CEM-CCR5 cells or PBMC (5 ϫ 10 6 ) were infected with viral mixtures representing different ratios of the 3-or 4-B isogenic strains at an approximate m.o.i. of 0.0001 (see schematic diagram in Fig. 6B). The infectious titers of the viral stocks were determined as described above, and the competing viruses were used at equal multiplicity of infection (1:1,500 infectious units each) or one of the viruses at an m.o.i. 10 times higher than the other (50 units of one virus and 450 units of the other). In addition, monoinfections were also included in the assays for comparison (500 infectious units/assay). The PBMC were CD8 cell-depleted and activated with PHA for 72 h prior to viral infection. The cells were incubated with the viruses in complete RPMI medium supplemented with 10 g/ml Polybrene for 2 h at 37°C. Subsequently, the cells were washed three times in PBS and incubated in RPMI 1640 complete medium. The cells were monitored, and the medium was replenished twice a week, and fresh cells, activated PBMC from the same donor, were added to the cultures as required. The secretion of p24 into the culture medium was monitored periodically using a commercial kit (PerkinElmer Life Sciences), a and viral growth curve for each monoinfection was constructed. Two weeks following the infection, the genomic DNA was extracted from the cells using a commercial kit (QIAmp blood mini kit, catalog no. 69504, Qiagen India, New Delhi, India), and 250 ng of the DNA was used in the LTR-HTA. For the EcoHIV-mouse infection model, each animal, four female BALB/c mice per group, received intraperitoneally 5 g of p24 equivalent of the virus generated in HEK293T cells. Monoinfections and dual infections were established essentially as described above for the T-cell infection. Genomic DNA was extracted from the splenocytes 2 weeks following the viral infection. The relative replicative fitness of the competing viral strains was evaluated essentially as described previously (32) and as depicted in supplemental Fig.  4A. Briefly, the replicative fitness of a viral strain was calculated as the ratio of the band intensities in a coinfection to that in the monoinfection which in turn was compared with the summation of the two ratios of the two competing viruses. For instance, production of the 3-B virus was calculated to be the ratio of the band intensities of the coinfection "c" to that of the monoinfection "a." Relative fitness of the 3-B viral strain "W 3 " was then expressed as the ratio of 3-B band comparison (c/a) to the summation of 3-and 4-B band ratios (c/a ϩ b/d). A similar strategy was used to determine the relative fitness of the 4-B viral strain "W 4 " that was expressed as the ratio of 4-B band comparison (b/d) to the summation of 3-and 4-B band ratios (c/a ϩ b/d).
Analysis of the Post-entry Events of the Viral Infection-CEM-CCR5 (5 ϫ 10 6 ) cells were infected independently with 5,000 infectious units (m.o.i. ϭ 0.001) of 3-or 4-B isogenic viruses pretreated for 1 h with 10 units/ml of DNase I (catalog no. M0303S, New England Biolabs) and harvested at different time points for the evaluation. Cellular DNA was extracted using a commercial kit (QIAmp Blood Mini Kit, Qiagen India Pvt. Ltd., New Delhi, India). The reverse transcription products were detected 12 h following the viral infection using a primer pair N1734 (5Ј-TGTGTGCCCGTCTGTTGTGT-3Ј) and N1735 (5Ј-GAGTCCTGCGTCTAGAGGATC-3Ј) that amplifies a 142-bp fragment from the U5region. A total of 250 ng of the genomic DNA was used for the real time PCR analysis using a commercial kit (SensiFAST SYBR MasterMix kit, Bioline, London, UK). A standard curve for the PCR products was prepared using a 10-fold serial dilution of the pIndie-C1 plasmid ranging from 10 8 to 1 copy, diluted using a salmon sperm DNA solution (50 ng/l). A real time PCR for the detection of the 2-LTR circles was performed 24 h following the viral infection using the primer pair N1736 (5Ј-TGGTTAGACCAGATCT-GAGCCT-3Ј) and N1737 (5Ј-AGGGTTGACCACTCCC-AGTCCCGC-3Ј) that amplified a 223-bp fragment spanning the U5-U3 regions in the LTR. A 1,268-bp fragment representing the 2-LTR circle target was generated using PCR from the Indie LTR. A serial 10-fold dilution of this fragment ranging from 1 to 10 8 copies was used to construct the standard curve, and the copy number of the 2-LTR circles was determined using the regression analysis. The analysis for the number of viral integration events was performed at 48 h following the viral infection using the nested Alu-PCR strategy. The forward primer N1739 (5Ј-AGCTAGGGAACCCACTGCTTAAGC-3Ј) used in the first round of the amplification is located in the R region of the LTR and the reverse primer N1740 (5Ј-TGCTGGGATTACAGGCGTGAG-3Ј) in the Alu repeat elements. The first round of the PCR amplified a mixed population of PCR products depending on the distance between the proviruses that may have integrated randomly into various genomic locations and the Alu sequences dispersed throughout the genome. The inner primer set amplified a target sequence of 200 residues within the R-U5 region of the LTR using the primer pair N1720 (5Ј-GTTAGACCAGATCTGAGCCT-3Ј) and N1741 (5Ј-GGTAGTGTGGAAAATCTCTAGCAG-3Ј). To generate a standard curve for the Alu-PCR, we infected HEK 293 cells with vesicular stomatitis virus G protein (VSV-G) pseudotyped LGIT virus containing the Indie-C1 LTR. The cells were infected at a high m.o.i. and selected over a period of 1 month to ensure representation of diverse integration events. At the end of the selection period, the infected cell pool contained an integration incidence of 1.4 Ϯ 0.6 copies per cell. The genomic DNA extracted from these cells was used to generate an integration DNA standard curve ranging from 1 to 10 3 copies/cell. The number of the viral integration events in the samples was determined by regression analysis using the standard curve. The magnitude of the viral transcription from the 3-or 4-B viral promoters was evaluated at 48 h following the viral infection. Cells were activated with 100 ng/ml TNF-␣ 1 h prior to harvesting or were left without activation. Total cellular RNA was extracted from 1 ϫ 10 6 cells using the RNeasy Plus kit (catalog no. 74124, Qiagen India, New Delhi, India). The firststrand cDNA was synthesized using the SuperScript II reverse transcriptase (catalog no. 18064-014, Invitrogen). Following this, two different PCRs were performed, one for the proximal viral transcripts within the TAR element and the other for distal transcripts in Tat located ϳ5.4 kb downstream of the transcription start site (see schematic diagram, Fig. 7A). Proximal viral transcripts were detected with the primer pair N1720 (5Ј-GTTAGACCAGATCTGAGCCT-3Ј) and N1721 (5Ј-GTGG-GTTCCCTAGTTAGCCA-3Ј) that amplified an 89-bp fragment in the TAR element. The primer pair N1728 (5Ј-TTGC-GACAGAGAAGAGCAAG-3Ј) and N1729 (5Ј-GATACTTA-CTGCTTTGATAT-3Ј) amplified a distal viral transcript of 226 bp in Tat. Transcription from the cellular gene ␤-actin was used for the gene expression normalization of the viral promoters using a primer pair N1730 (5Ј-GTCGACAACGGCTCCGGC-3Ј) and N1731 (5Ј-GGTGTGGTGCCAGATTTTCT-3Ј) that amplified a 239-bp fragment. Quantitative real time PCR for each primer pair was performed using the comparative C t (⌬⌬C t ) strategy and using the Corbett Rotor-Gene 2000 real time PCR machine (Corbett Lifescience, Hilden, Germany).
Determination of the CD4 T-cell Count and Plasma Viral Load-The CD4 cell count was determined in fresh blood samples within 4 h from the time of sample collection. The counts were determined using MultiTest antibodies, counting beads (340,499 and 349,480) and the Lyse-No-Wash protocol as recommended by the manufacturer (BD Biosciences). The samples were acquired on a BD FACSCalibur flow cytometer. The plasma was separated from the whole blood within 6 h of collection and stored in a deep freezer as 0.5-ml aliquots. Plasma viral load was determined using a commercial kit (NucliSens EasyQ version 1. 1.1.1, bioMérieux, France), and plasma viral load below the detection limit of Ͻ25 IU/ml was assigned a DECEMBER 28, 2012 • VOLUME 287 • NUMBER 53 value of 25 IU/ml. A plasma sample of known viral load and a sample negative for the virus were included in each run, and the assay was validated only if the internal controls gave consistent results. Samples were thawed only once and used immediately for viral load estimation.

Rapid Expansion of Variant Viral Strains in India and
Worldwide-Previous analysis from our laboratory identified five different types of sequence insertion in the subtype C viral promoter between the transcription factor-binding sites RBEIII and NF-B (26). The sequence insertion generated an additional binding site for the transcription factors NF-B or RBF2 but not for both simultaneously. In the case of a dual insertion, typically only one of the two sites was found intact, whereas the other contained variations that should render the second site nonfunctional (NF-B-like or RBEIII-like motifs). Samples (n ϭ 607) for the previous analysis were collected nearly a decade ago between 2000 and 2003 from all the four southern states of India (33). The prevalence of three of the five promoter variant viral strains containing the insertions of NF-B, B-like, and RBEIII sites was 2, 1, and 1%, respectively, in 2000 -2003 (Fig. 1A). To identify the contemporary prevalence of the subtype C LTR variant viral strains in a cross-sectional analysis, fresh blood samples were collected from four different clinics (three in southern and one in northern India) in 2010 -2011 (supplemental Table S1). Sequences of the U3 region or the full-length LTR were determined from plasma viral RNA and proviral genomic DNA, respectively. The results indicated a rapid expansion of all three LTR variant viral strains at all four clinics (Fig. 1A). Of the variant strains, the 4-B viruses expanded at a faster rate with their prevalence increasing dramatically from 2% in 2000 -2003 to as high as 20 -30% in 2010 -2011. Using a different and well defined clinical cohort (27), under which several clinical samples were collected from a different clinic (Seva Free Clinic, Bengaluru) in 2005, we found that the prevalence of each of the three variants was intermediate, with a level of 5, 2, and 4% prevalence for B-, B-like, and RBEIII insertions, respectively (Fig. 1A).
Importantly, near full-length sequences of 256 viral strains from South Africa, collected between 2000 and 2005, have been deposited in the databases (34), thus providing an opportunity to examine the molecular nature of the insertions in a different clinical cohort dominated by subtype C. We found that the South African samples, as those of India, contained all three types of insertions in the viral promoter (Fig. 1B). Among the insertions, B, B-like, and RBEIII site insertions constituted of 4.3, 5.1, and 8.2% of the total number of samples, respectively. Furthermore, an analysis of the global sequences of HIV-1 available in the databases identified the B site insertions from other countries, including China (n ϭ 2/81), South Africa (n ϭ 16/315), Tanzania (n ϭ 11/58), Botswana (n ϭ 3/72), Zambia (n ϭ 2/52), and Spain (n ϭ 2/117) (Fig. 2A). These data collectively suggested that the emergence of 4-B viruses is a universal phenomenon applicable to subtype C regardless of the geographical location. Because the 4-B strains of subtype C demonstrated the fastest expansion rate in India, we next characterized the molecular and biological properties of this HIV-1 variant.
New NF-B Site Characterized by a Unique Genetic Variation Is Exclusive for Subtype C-Multiple sequence alignments of a subset of the representative 4-B containing C-LTR sequences of India and several similar sequences downloaded from the databases revealed several molecular properties unique to subtype C ( Fig. 2A). Several of the Indian LTR sequences used for this analysis have been previously reported (26). A 4-B subtype C promoter contains three types of genetically distinct B sites, and for clarity they are labeled accordingly. The two genetically identical canonical B sites (5Ј-GGGACTTTCC-3Ј) found in all the HIV-1 strains are designated here as H-B sites. The Sp1-proximal variant B site (5Ј-GGGGCGTTCC-3Ј) unique for subtype C alone, and not found in any other HIV or simian immunodeficiency virus, is designated here as C-B site. The fourth variant B site (5Ј-GGGACTTTT-3Ј) inserted through MFNLP is designated here as the F-B (for fourth) site. A careful examination of the sequence alignment revealed a tandem duplication of 22 residues, consisting of 9 residues of the H-B II site and 12 upstream residues, inserted immediately upstream of the viral enhancer ( Fig. 2A). The original and the duplicated sequences were separated by a single "T" residue that constituted the 10th position of the newly created F-B site.
The insertion of specific sequences in the viral promoter has been studied extensively in the context of the subtype B LTR (22) but not in any other viral subtype, including subtype C. Importantly, sequence duplication of the NF-B-binding site has not been reported previously in any HIV subtype, including subtypes B and C, although a few LTR sequences have been found in the extant databases. To examine if an association existed between the nature of the viral subtype and the duplication of any of the three transcription factor-binding sites (NF-B, NF-B like, or RBEIII), we performed an extensive search and downloaded several thousands of the LTR sequences belonging to the major genetic subtypes of HIV-1 available in the databases. Of the total 3,054 full-length LTR sequences belonging to all the HIV-1 genetic subtypes from the databases or derived through this work, 479 (15.7%) contained sequence insertions upstream from the viral enhancer. We determined the percent prevalence of the different MFNLP in each of the viral subtypes and found that the acquisition of the F-B site is an exclusive property of subtype C (Fig. 2B). Of the total 1,113 sequences in subtype C, 213 sequences (19.1%) con- tained the insertions of which 81, 56, and 76 sequences were characterized as containing the F-B (7.3%), B-like (5.0%), and RBEIII (6.8%) sites, respectively. In contrast, viral subtypes A, B, D, F, and group O predominantly contained the RBEIII site insertions. Subtype G, the recombinant circulating recombinant form AG_02, and group N contained nearly equivalent numbers of RBEIII and B-like or H-B site insertions, although the sample size available for the analysis was small. The insertion of the 22-residue sequence and the creation of the C-to-T variation in the F-B site could be a consequence of the propensity of subtype C reverse transcriptase for sequence duplication and the nontemplate-mediated base addition at the growing end of the viral DNA in a RNA/DNA hybrid molecule.

F-B Site Is Biologically Functional and Recruits p50-p65
Heterodimer-To understand if the genetically variant F-B site is biologically functional in the context of the viral promoter and recruits NF-B, we used EMSA. Using radiolabeled double-stranded DNA probes representing the canonical H-B site or the variant F-B site, we asked if any cellular factors from Jurkat cell extracts were bound to the DNA probes. Cell extracts were prepared from Jurkat T-cells with or without TNF-␣ activation for 60 min. Two distinct complexes were found under control conditions, and these complexes were enhanced severalfold following TNF-␣ activation (Fig. 3A,  lanes 2 and 3). These specific complexes were out-competed with 25-fold molar excess of cold probes representing H-, F-B, A probe for the Oct-1 cellular factor was used as a loading control. Free probe (FP) and nonspecific (NS) and specific complexes are indicated. B, H-and F-B probes bind the p50-p65 heterodimer in the supershift assay. Nuclear extracts prepared from control or TNF␣-treated Jurkat cells were preincubated with affinity-purified rabbit antibodies specific to the Rel family members as indicated at the top of the lanes. EMSA was performed as described in A. C, binding affinity determination of the H-and F-B probes for the recombinant p50 protein in isothermal calorimetry. Double-stranded H-, F-or a mutant B site containing oligonucleotides were used in the assay. The raw ITC traces from a representative titration are presented with respect to the base line, and the heat change versus the molar ratio of the titrated products is plotted. D, chromatin immunoprecipitation analysis. A schematic representation of the VSV-G pseudotyped viruses expressing a dual-expression cassette of EGFP and Tat is illustrated in the upper panel. Jurkat cells were infected at an m.o.i. of 1 and 72 h later stimulated with TNF-␣ for 60 min (left panel) or left without activation (middle panel). Immunoprecipitation of the complexes was performed using 5 g of antibodies as indicated. A reference IB-␣ cellular promoter containing three NF-B sites was used as a positive control for the ChIP assay. Enrichment of the NF-B subunits and transcription-competent RNA polymerase II on the viral (HHHH-or FFFF-LTR) or the cellular promoter was evaluated using PCR. One-tenth of the input chromatin was uncross-linked and used as an input control. IgG, isotype-matched control antibody. Enrichment of the Ser-2 phosphorylated form of the RNA polymerase II subunit was evaluated by targeting the Tat region located ϳ3,000 bp downstream of the TSS of the viral promoters (right panel). The GAPDH cellular promoter was used as a positive control. or a B site derived from the IB cellular promoter (Fig. 3A,  lanes 4 -6), but not with a mutant F-B probe (lane 7). The nature of the complexes was comparable between H-and F-B probes suggesting similar function regardless of the variation. Furthermore, supershift analysis using affinity-purified rabbit antibodies identified the presence of p50-p65 heterodimers in the complexes but not other members of the Rel family (Fig.  3B). Probes containing swapped flanking sequences (H* and F*) produced comparable results thus confirming that the flanking sequences did not influence NF-B binding to the probes (supplemental Fig. 1A). Importantly, densitometric analysis of the supershifted p50 and p65 band intensities revealed comparable intensities between H-and F-B probes suggesting similar binding affinities (supplemental Fig. 1B). As additional confirmation, using isothermal calorimetry, we compared the binding affinities of a fixed quantity of recombinant p50 protein for the H-or F-B double-stranded oligonucleotides as a function of increasing oligonucleotide concentration. The H-and F-B oligonucleotides bound the recombinant protein with a comparable affinity, displaying K d values of 95 and 83 nM, respectively (Fig. 3C).
To compare the binding profiles of NF-B to the viral enhancer in the chromatin context of the integrated provirus, we performed a ChIP assay. Given the close proximity of the genetically distinct and multiple B sites in the viral enhancer and the repetitive nature of the sequences flanking the F-B site, it was practically not possible to position primers for the ChIP analysis. To circumvent this problem, we generated two different HIV-1 reporter viruses, based on a previously reported pcLGIT vector (35). In these reporter viruses, the viral enhancer in the 3Ј LTR was engineered to contain four tandem repeats of the H-(HHHH) or F-B site (FFFF) with the spacer sequences and other molecular features maintained and comparable with the wild type 4-B viruses (Fig. 3D, top panel). Daughter viruses produced from these vectors would copy the 3Ј U3 into the 5Ј viral promoter thus placing the expression of the reporter gene EGFP and that of the viral transactivator protein Tat under the control of the engineered LTRs. The nature of the NF-B binding to the viral enhancers could therefore be compared in a homogeneous B site context following viral integration into the chromatin. Importantly, the HHHH-and FFFF-LTRs are functional and drive the expression of a reporter gene with efficiencies comparable with that of the wild type FHHC-LTR (see below). VSV-G envelope pseudotyped viruses prepared in HEK293T cells were used for the infection of Jurkat T-cells at an m.o.i. of 1, and 72 h later, the cells were stimulated with TNF-␣ for 60 min or left without activation. Fragmented chromatin complexes were precipitated using antibodies specific to p50, p65, or unphosphorylated CTD ( 1 YSPTSPS 7 ) of RNA polymerase II (36). Isotype-matched control antibody was used as a negative control. An amplification was performed to detect and quantitate the presence of the transcription-competent complexes along with the NF-B on the viral (HHHH-or FFFF-LTR) or a control cellular IB-␣ promoter (Fig. 3D). In the absence of TNF-␣ activation, only p50, but not p65, was found recruited to the IB-␣, H-B, and F-B enhancers (Fig.  3D, lanes 2 and 3). Following cell activation, however, recruitment of both p50 and p65 was evident at all the promoters, at levels comparable between the H-and F-B enhancers (Fig. 3D,  lanes 7 and 8). Likewise, the unphosphorylated form of RNA polymerase II complex was detected following TNF-␣ activation on both the viral as well as the cellular promoters (Fig. 3D,  compare lanes 4 and 9). The PCR amplifications above targeted a region proximal to the TSS on both the viral and cellular promoters. The Ser-2-phosphorylated form of RNA polymerase II complex was detected on the viral promoters following TNF-␣ activation by amplifying a region ϳ3,000 bp downstream of the TSS (Fig. 3D, lane 15). The complex was also detected on the GAPDH promoter, a constitutive cellular promoter (36), regardless of the cellular activation status (Fig. 3D,  lanes 12 and 15). Furthermore, end-labeled H-or F-B DNA probes recruited NF-B from Jurkat cell nuclear extracts at comparable affinities in Southwestern blot analysis (data not shown). These results collectively suggested that the F-B site regardless of the variation at position 10 behaves functionally in a manner identical to the canonical H-B site with respect to NF-B binding.

F-B-binding Site Confers a Quantitative Gain-of-Function
Advantage on the Subtype C LTR-To evaluate the possible effect of the number and/or genetic differences of H-and F-B motifs on gene expression from the viral enhancer, a number of dual-reporter gene expression vectors (37) were constructed. We used a representative full-length LTR of 656 bp (FHHC) from a primary subtype C viral isolate BL42-02 (accession number HQ202921), which represents the typical subtype C architecture (Fig. 4A). Using site-directed mutagenesis, the 22-bp insertion consisting of the F-B site was deleted from the viral enhancer to make HHC-LTR to generate the typical context of the subtype C promoter. Two additional viral LTRs that were homogeneous for the H-or F-B sites (HHHH and FFFF) were also made to examine how a viral enhancer homogeneous for the B sites would function. In all the above 4-B LTRs, the sequence context flanking the individual NF-B-binding sites is identical to that of the parental viral promoter. A null vector that was devoid of all the NF-B sites was also constructed. Furthermore, for comparison, two additional reporter vectors containing the viral promoters derived from subtype B (NL4-3) or subtype C (Indie C1) reference molecular clones were also made. Using the isogenic LTR reporter vectors, the expression of luciferase was examined in Jurkat cells under different conditions of cell activation, in the absence (Fig. 4B) or presence (Fig. 4C) of Tat or under synergistic activation conditions (Fig.  4D).
In the absence of Tat, all the viral promoters were responsive to extracellular stimulation with the response to PMA being the strongest. FHHC-LTR consistently expressed significantly higher levels of the reporter gene under all the conditions of cell activation as compared with HHC-LTR (Fig. 4B, p Ͻ 0.001). Importantly, HHHH-LTR, but not FFFF-LTR, performed at efficiencies comparable with that of the FHHC viral promoter. The under-performance of the FFFF-LTR, however, was the consequence of the loss of the overlapping NFAT site due to the C-to-T variation (supplemental Fig. 2). In the presence of Tat, the profile of the reporter gene expression essentially remained the same except that the magnitude increased by 5-10-fold (Fig. 4C). With the exception of the FFFF-LTR, the gene expres- DECEMBER 28, 2012 • VOLUME 287 • NUMBER 53 sion pattern of the other viral promoters was quite comparable when the cells were exposed to combinations of two different activators (Fig. 4D). Nevertheless, when compared with the null vector, the FFFF-LTR demonstrated significant magnitude of reporter gene expression (p Ͻ 0.001). Collectively, these data proved quantitative gain-of-function by the C-LTR due to the acquisition of the F-B site. The gain, although moderate, was statistically significant. It also appears that the natural context of B site genetic variation and the order in which these sites are arranged have biological significance.

Role of NF-B in HIV Subtype C Evolution
To determine whether the genetic difference between H-and F-B sites could lead to qualitative gain-of-function, a few signaling pathways of the T-cell activation were examined using small molecule inhibitors. To this end, three different small molecule inhibitors Rotterlin, staurosporine, and G06976 were used to intercept T-cell signaling mediated through different protein kinases (38). Jurkat cells were transfected with the LTR reporter vectors, activated with PMA in the absence or presence of one of the inhibitors, and the expression of luciferase was monitored (supplemental Fig. 3, left panel). All the viral promoters demonstrated enhanced transactivation following activation and significant reduction in the presence of the inhibitor molecules. The pattern of gene modulation was comparable between FHHC-and HHC-LTRs except for the quantitative differences. Likewise, TNF-␣-induced reporter gene expression was efficiently blocked by leptomycin in a manner identical to all the variant viral promoters, including FHHC and HHC or HHHH (supplemental Fig. 3, right panel). Collectively, the inhibition of cell signaling in Jurkat cells or HEK293 cells (data not shown) failed to identify notable differences between H-and F-B sites. However, given that HIV-1 is capable of infecting a wide range of target cells in the natural context, a

. F-B site confers quantitative gain-of-function on the C-LTR.
A, schematic representation of the dual-expression vectors. The isogenic LTRs originated from a representative subtype C LTR BL42 (FHHC). Reference LTRs from subtype B (NL4.3) and subtype C (Indie-C1) were also included for comparison. B, induced reporter gene expression from the viral promoters in the absence of Tat. Jurkat cells were transfected with one of the reporter vectors illustrated in A above and 12 h later were subjected to diverse activation conditions as indicated, and the luciferase secretion at 24 h was evaluated from the medium. Each assay was performed in triplicate wells, and the data are presented as mean relative light units Ϯ S.D. The data are from one of the three representative experiments. C, induced reporter gene expression from the viral promoters in the presence of Tat. Jurkat cells were co-transfected with a plasmid pool containing a Tat expression vector and one of the reporter vectors. D, induced reporter gene expression from the viral promoters under synergistic activation conditions. Jurkat cells were treated as in B above and subjected to combinations of two or three different activation conditions as indicated.
more thorough analysis would be required to evaluate the biological significance of the C-to-T variation in the F-B site.

Subtype C 4-B Viral Strains Out-compete the 3-B Isogenic Counterparts in Experimental Systems and in Natural Infection-
To examine replication fitness of the 4-B viral strains, using a standard subtype C molecular clone Indie-C1, paired infectious viruses, with or without the F-B site in the enhancer but otherwise having an isogenic background (hereafter referred to as isogenic), were generated (Fig. 5A, top panel). Replication kinetics of the two viruses in CEM-CCR5 T-cells (Fig. 5B) or in PBMCs (Fig. 5C) from several healthy donors demonstrated the superiority of the 4-B viral strain at every time point, and this difference remained significant throughout the study period (p Ͻ 0.001). Furthermore, the replicative fitness of the two viral strains was compared in pairwise competition assays at different ratios with respect to the m.o.i. as depicted schematically (supplemental Fig. 4A). The HTA for the LTR was used to monitor the competing viral strains essentially as described previously for HIV-1 envelope (32). The LTR-HTA consisted of a nested PCR that amplified a 340-or 362-bp fragment in the second round of the PCR consisting of the viral enhancer, from the 3-and 4-B viral strains, respectively (supplemental Fig.  4B). The PCR fragments were hybridized to an end-labeled 330-bp DNA probe amplified from an identical location from subtype B LTR of the NL4-3 virus. When the hetero-duplexes are resolved in a polyacrylamide gel, the identity of the two competing viral strains can be unequivocally distinguished (supplemental Fig. 4C) and quantified using phosphorimaging; the relative proliferation of each virus was then calculated as depicted (see mathematical formula, supplemental Fig. 4D). Of note, the detection strategy of HTA employed here eliminates the possible experimental artifacts at two different levels. First, the primer pairs bind identical sites on both of the competing viruses thus eliminating amplification differences. Second, the same probe binds both of the amplified PCR products under identical experimental conditions in the same vial thus eliminating hybridization differences. Thus, the HTA detection format offers a greater level of confidence to the data in not discriminating between the competing viral strains in addition to being a quantitative assay (supplemental Fig. 6). In pairwise competition assays, at an equivalent m.o.i., the 4-B virus out-competed the 3-B counterpart at the 2nd week in CEM-CCR5 cells (Fig. 5D) or PBMCs (Fig. 5E), and the differences were statistically significant (p Ͻ 0.05 by two-tailed Student's t test). When PBMC were infected at an equal m.o.i. (1:1) and examined using HTA 48 h after the infection, the magnitude of infection by 3-and 4-B viral strains was comparable suggesting that the 4-B virus did not dominate its counterpart at the early step of viral entry but only at the later stages (supplemental Fig. 5).
The pairwise competition experiments were also performed in an experimental animal model, the EcoHIV -mouse system (39). In EcoHIV, which predominantly represents the prototype subtype B NL4-3 molecular clone, HIV-1 gp120 was replaced by gp80 of the ecotropic murine leukemia virus, a retrovirus that infects only rodents. The chimera virus EcoHIV can productively infect immune-competent mice and establish spreading viral infection that could be detected and quantified using a real time PCR. EcoHIV challenge system offers a low cost experimental animal model for evaluating anti-retroviral drugs and examining vaccine efficacy (40,41). The EcoHIV clone was further engineered to replace the original subtype B LTR at the 3Ј end with subtype C LTRs isogenic for the F-B site (Fig. 6A). Each animal received intraperitoneally 5 g of p24 equivalent of the virus generated in HEK293T cells. Monoinfections and dual infections were established essentially as described above for the T-cell infection. Two weeks following the viral infection, genomic DNA was extracted from the splenocytes, and the HTA detection strategy was used to determine fitness differences between the viral strains in the in vivo conditions. Viral infection typically peaked around week 2; hence, we examined viral fitness at this time point. As in the in vitro system, the 4-B virus out-competed the 3-B counterpart when both viruses were inoculated into the mice at equivalent p24 concentrations (Fig. 6B, p Ͻ 0.001). The data from both in vitro and in vivo models collectively confirmed that the 4-B virus exhibits superior replication fitness due to the acquisition of the F-B motif. Importantly, given the isogenic nature of the competing viruses, the replication advantage could be entirely ascribed to the presence of the F-B site in the viral enhancer.
The HTA detection strategy can efficiently differentiate between the 3-and 4-B viral strains in a mixed infection. Taking advantage of the HTA assay, several clinical samples that contained mixed infections with both 3-and 4-B viruses were examined to determine which of the two viruses would be the dominant strain in natural infections. The HTA analysis was performed as described above, however, with one difference. Because monoinfection controls were not a possible option in the natural infection, unlike in the experimental models, the mathematical formula could not be applied. Therefore, we directly compared the band intensities of the competing viral variant strains to measure viral domination. The direct comparison of the band intensities must faithfully represent the natural distribution of the competing viral strains in the clinical sample given the uniform hybridization of the probe to either of the viral LTRs in the HTA (supplemental Figs. 5 and 6). In a subset of six samples randomly selected from the 2000 -2003 cohort, in assays for proviral DNA, the band intensities representing the 4-B viral strains were significantly higher than those representing the 3-B viruses (Fig. 6C, p Ͻ 0.001). Importantly, in each of the clinical samples, the 4-B viral variant was the dominating partner. From a subset of three clinical samples, we cloned the 3-and 4-B LTRs into plasmid vectors, determined the sequences, and examined the phylogenetic relationship between the variant viral pairs. The 3-and 4-B viral sequences clustered together based on the identity of the clinical sample confirming distinct identity (supplemental Fig. 7). Additionally, this analysis also ruled out the possibility of the 3-B amplification being a laboratory-generated contamination. In the case of a contamination, all the 3-B sequences are expected to cluster together in the phylogenetic analysis. Viral amplification from the genomic DNA represents the viral species archived in the proviral compartment but not the active virus present in the plasma. We therefore extended the HTA analysis to plasma viral RNA extracted from three clinical samples available from the 2005 Jawaharlal Nehru Centre for Advanced Scientific Research cohort. The 4-B virus was found to be the dominant virus in the plasma viral RNA in all three clinical samples (Fig. 6D, left panel, p Ͻ 0.001). Additionally, from one of the subjects S189, where two plasma samples were available 1 year apart, the 4-B virus remained dominant over this period suggesting that the replicative difference established between the two viruses is stable over extended periods (Fig.  6D, right panel).

4-B LTR Manifests Superior Magnitude of Transcription Initiation and Elongation-NF-B plays a critical role in regu-
lating basal level transactivation from the viral promoter in the absence of Tat and overall gene expression in its presence. Acquisition of an additional and functional B site therefore must confer a significant replication advantage on the 4-B viral strains of subtype C. To understand at what level of the viral life cycle the 4-B viral strains might achieve replication advantage, CEM-CCR5 T-cells were infected independently with HHC or FHHC isogenic viral strains (Fig. 7A), and viral proliferation was compared at four different stages using real time PCR. Equivalent TCID 50 units of the viruses were used for cell infection. We compared the generation of the reverse transcription products in the cell extract, the two-LTR circle formation in the nucleus, the extent of viral integration using Alu-PCR, and the formation of proximal versus distal viral transcripts, as depicted schematically (Fig. 7A). No significant differences were found between the two viruses at the level of the reverse transcription (Fig. 7B), nuclear translocation (Fig. 7C), or proviral integration (Fig. 7D). A significant difference, however, was evident at the level of transcription, at both transcription initiation and elongation (Fig. 7E). Two different PCRs were performed as follows: one for the proximal viral transcripts within TAR and the other for distal transcripts in Tat located ϳ5.4 kb downstream of the transcription start site (see the schematic diagram, Fig. 7A). Significantly higher levels of viral transcripts, from both proximal and distal transcripts, were generated from the 4-B LTR from cells in the absence of activation suggesting stronger basal level promoter activity. TNF-␣ activation of the cells induced 1 or 2 orders of magnitude higher gene expression from both the viral promoters; however, the 4-B LTR manifested significantly higher transcription as compared with the 3-B LTR. Overall, the data suggested that the acquisition of an additional B-binding site by the HIV-1 LTR significantly enhanced promoter strength. In summary, transcription enhancement from the viral promoter containing four B-binding sites was improved at two different levels, transcription initiation and elongation, but not at other phases of the viral life cycle.
Higher Plasma Viral Load and Comparable CD4 Cell Count Are Associated with the 4-B Viral Infection-If the 4-B LTR can generate more viral transcripts, then HIV-1 infections with the 4-B strains may lead to higher plasma viral loads as compared with the 3-B viral infections. To test this hypothesis, plasma viral loads and CD4 cell count were measured at a single time point in 60 and 20 subjects of 3-and 4-B viral infections, respectively. Natural infection with the 4-B strains contained evidently higher mean plasma viral loads as compared with those with the 3-B viruses (Fig. 8A), and the difference between the means was found to be statistically significant (p Ͻ 0.01). Of note, there was considerable overlap between the two groups, and data scatter was indeed large. Importantly, a similar comparison of the CD4 cell count failed to detect a statistically significant difference between the two groups (Fig. 8B). A significantly higher magnitude of plasma viral loads in the 4-B infections is suggestive of greater probability of viral transmission because viral transmission is directly correlated to plasma viral load (42,43).

Positive Selection of Variant Viral Strains Captured at the
Population Level-Current results suggest the emergence of at least three different and novel viral strains of subtype C in India, South Africa, and probably other geographical locations where this viral subtype is dominant (Fig. 1). The common theme that underlies the emergence of the variant viral strains is the acquisition of an additional TFBS, primarily an NF-B or an RBEIII motif, through sequence duplication. The acquisition of the RBEIII site has been studied extensively in the context of subtype B (22). Furthermore, our analysis shows that RBEIII duplication is universal among all the HIV-1 genetic subtypes, including subtype C (Fig. 2B). In contrast, the acquisition of the NF-B (F-B) site appears to be a property exclusive to the subtype C virus and has not been reported previously. Importantly, duplication or deletion of diverse TFBS in the viral promoter, including that of NF-B and Sp1, by itself is not a novel theme. A large number of publications have previously reported such LTR modifications in HIV-1 (10, 12, 44 -52), HIV-2 (53), and simian immunodeficiency virus (54,55). However, unlike previous reports, our study provides experimental evidence that the new viral strains gained a potential selective advantage as a consequence of the acquired TFBS and importantly that these strains have been expanding at the population level. To the best of our knowledge, this is the first report demonstrating a variant viral strain (4-B or RBEIII duplication variant) progressively spreading at the population level and replacing the canonical parental strains (3-B subtype C strains).
The data presented here are limited by a few technical issues, primarily the quality and sample size of the clinical samples and sequences. Most of the analyses performed here were based on the sequences primarily generated through this work and sequences available from extant databases. Given this limitation, our results could be considered only as inferential evidence suggesting a positive evolutionary selection of the variant viral strains. However, our work embodies the data collected over a period of a decade. Furthermore, a larger number of viral promoter sequences belonging to the variant viral strains were  F-B Site Is Biologically Functional in the Viral Context-We confirmed the biological function of the F-B site using a range of experimental formats. The presence of the F-B site conferred a quantitative gain-of-function on the viral promoters in the reporter gene expression analyses (Fig. 4). Additionally, binding of the p50-p65 heterodimer to the F-B site was unequivocally demonstrated in the supershift (Fig. 3B), ChIP (Fig. 3D), and Southwestern blot assays (data not shown). However, in the context of the artificial viral promoter FFFF-LTR, the F-B site demonstrated significantly low magnitude reporter gene expression as compared with the H-B site in the HHHH-LTR (Fig. 4), suggesting that the F-B site cannot be a functional substitute for the H-B site in the natural viral promoter. Using ChIP analysis, we further demonstrated that the diminished strength of the F-B site in the FFFF-LTR is the consequence of the absence of NFAT recruitment to this element due to the C-to-T substitution at position 10 (supplemental Fig. 2). It therefore appears that the F-B site confers a quantitative gain-of-function on the transcriptional strength of the viral promoter by the selective recruitment of NF-B but not NFAT. The data presented here prove that the F-B site is fully functional in the context of the viral promoter and confers a gain-of-function advantage on the variant viral strains.
A little controversy surrounds the functionality of a different NF-B site in the subtype C promoter that is located proximal to the Sp1 sites and referred to here as the C-B site (5Ј-GGGGCGTTCC-3Ј). Using radiolabeled DNA probes and gel shift analyses, a couple of publications previously claimed that the NF-B site in subtype C LTR may not be functional. Unfortunately, neither of the publications used authentic sequences with the natural flanking sequences. For instance, Lemieux et al. (56) used a double-stranded synthetic probe that differed from the authentic sequence at two different sites (5Ј-GGGGCGGTCT-3Ј, differences underlined). Likewise, Roof et al. (57) used a probe containing two tandem sequences of the NF-B site that are separated by a single T' residue. Importantly, the probe containing the C-B site sequences was not used directly for the probe binding but only in the cold competition of the other probes. Given the technical limitations, it may not be appropriate to conclude that the subtype C-specific NF-B sequence is not functional especially in the natural context. In contrast to the above reports, two other publications demonstrated biological functioning of the C-B site. Naghavi et al. (10) demonstrated strong binding of the cellular factors from HeLa nuclear extracts to the DNA probes and Montano et al. (12) showed an efficient competition between the labeled canonical NF-B probe and the cold C-B sequence in the gel shift assay. Furthermore, in an extensive analysis using protein binding microarrays and diverse NF-B-binding sequences, Siggers et al. (58) demonstrated that a sequence nearly comparable with the HIV-1 subtype C unique C-B probe (5Ј-GGGGCGTTCC-3Ј) is endowed with the potential to bind NF-B. Finally, in our hands, the C-B site efficiently binds NF-B from the T-cell nuclear extracts (data not shown).
Moderate, but Not Profound, Differences in the Replication Fitness Drive Viral Evolution-We measured biological differences between the B variants using a panel of reporter vectors or paired viruses that were constructed in an isogenic background. In all these experiments, reporter gene expression analyses and pairwise viral competition assays, we found modest, often 1.5-2-fold, differences between the 4-and 3-B viruses with the former typically dominating the latter. Several research groups previously reported such modest differences in reporter gene expression when comparing viral promoters of different subtypes (56). Importantly, the positive selection of the variant forms of the pathogenic organisms is typically driven by small, not profound, differences (59). As a function of time, the minor differences per replicative cycle of the competing viral strains will result in the measurable replicative advantage for one of the viral strains. The data presented in our work are statistically significant and offer a convincing explanation for the positive selection of the 4-B viral strains in the natural infection (Fig. 6, C and D). Importantly, even in a highly artificial in vivo model of EcoHIV, the domination of the 4-B viral strains is sustained (Fig. 6B), essentially corroborating the in vitro data obtained using the T-cell lines and PBMC (Fig. 5), which are the acceptable standards in the field currently.
Although the EcoHIV mouse model is a highly artificial system because multiple differences exist between the human and mouse hosts, viral proliferation could be reproducibly demonstrated in this setting. The EcoHIV virus used in the competition assays is essentially an HIV molecular clone NL4-3, the native envelope of which was substituted with that of a mouse virus murine leukemia virus. In other words, the chimera virus EcoHIV is like the simian-human immunodeficiency virus endowed with its own merits. One of the advantages of EcoHIV is that an immunocompetent mouse strain (e.g. regular BALB/c or C57BL) could be used for the study. This animal model therefore offers an inexpensive, simple, and powerful in vivo system for certain applications especially in resource-constrained countries where access to humanized mouse or primate models is a practical problem. In addition to the two experimental models used here, importantly, the dominance of the 4-B variant viruses was consistent in the clinical samples of co-infection that appears to remain stable over extended periods (Fig. 6, C and D). The 4-B viruses clearly dominated the 3-B counterparts in all the clinical samples tested, in both the proviral compartment and in the plasma virus. The data from the natural infection and the experimental models performed in the NF-B isogenic context collectively present a strong case that the fitness advantage of the 4-B viral strains could be attributed to the additional B site insertion in the viral enhancer. Furthermore, extension of these observations to the canonical subtype C viral strains implicates a critical role the additional NF-B site (the C-B site) may have played in the past for the successful expansion of the 3-B-containing subtype C strains in comparison with the 2-B-containing strains, although currently the latter viruses were not found.
Replicative Fitness of the 4-B Viral Strains Probably Confers a Higher Transmission Rate-The analysis of the correlation between the 4-B strains with viral load and CD4 ϩ T-cell count in patient samples should be interpreted with a word of caution as the patient samples are derived from a single time point only due to sampling limitations. Despite the sampling limitations, the 4-B viral strains appear to have gradually increased in prominence over a decade or longer in India (Fig. 1). Successful expansion of the 4-B strains at the population level could be the outcome of the relatively higher plasma viral load of these variant viruses in the body fluids (Fig. 8A); and higher plasma viral loads are directly correlated with enhanced transmission efficiency (42,43). Recent studies demonstrated that HIV-1 transmission to a new host involves the transmission of a single or as few as 2-5 viral strains in the majority of cases, even though the donor may contain a large number of genetically distinct viral strains (60,61). The B variant strains sharing the same envelope (data not shown) are expected to maintain identical biological properties, including cell tropism, preferred route of transmission, and target cell populations. Given the higher plasma viral load of the 4-B viral strains in a mixed infection, the probability of transmission of the 4-B strain to a new host is likely to be superior to that of the 3-B strain. Nevertheless, given the sampling limitations, we have been cautious not draw unwarranted inferences from the data.
Is a Stronger Viral Promoter Essential for Successful Viral Expansion?-Increased gene expression from the viral promoter is expected to cause at least two different problems for viral fitness potentially offsetting the advantages gained by a higher transmission rate. First, a higher magnitude of gene expression and enhanced viral load are expected to cause enhanced immune activation, although only an indirect association between these factors has been shown (62). Second, an increased number of the NF-B sites in the viral promoter should be disfavored for the establishment and maintenance of viral latency given the profound influence this transcription factor has on viral gene transcription (63,64). It therefore appears paradoxical that subtype C virus opted for a perplexing strategy of strengthening its promoter to increase replicative competence. This paradox perhaps could be resolved if subtype C virus indeed represents an advanced form of viral attenuation (65,66). Whether or not subtype C represents an evolved form of viral attenuation is highly controversial, and substantial evidence is lacking to support such an hypothesis. Likewise, it is also not known whether a higher magnitude of viral attenuation is essential for positive evolutionary selection of the variant viral strains. Despite these limitations, experimental proof examining the molecular basis underlying phenotypic differences between viral genetic subtypes has been gradually emerging in recent years (67)(68)(69). Subtype-dependent differences in cytokine induction profile have been demonstrated (70 -72). We would like to propose that the relatively less virulent nature of subtype C could offer this subtype a window of opportunity to accommodate relatively higher levels of gene expression, as has been demonstrated here by increased plasma viral load, without significantly increasing its virulence, as suggested by the absence of a difference in the CD4 cell count (Fig. 8C). We caution that the data presented here are from a cross-section analysis that needs confirmation with prospective studies; nevertheless, the findings are consistent with the 4-B viral strains gaining replication advantage at the population level as a consequence of their higher infectivity. At present, it is not known whether the 4-B viruses are more pathogenic than the standard subtype C strains.
Our data also raise several important questions. First, are the variant viral strains of subtype C likely to alter the landscape of the HIV demographics in India in the coming years? In the recent past, the rate of viral expansion has slowed or even declined in several global regions, including India (UNAIDS Global Report, 2010). How are the rates of viral prevalence going to be affected as a consequence of the emerging viral strains in India and elsewhere? Second, are the HIV-1 subtypes, especially subtype C given its supposedly high levels of replication competence, likely to undergo additional evolutionary modifications in the coming years? Considering the high impact the enhanced gene expression could have on immune activation and disease progression (73), it is rather unlikely that 4-B subtype C would repeat the same evolutionary strategy to acquire additional NF-B or other positive regulatory sites in the future to enhance the promoter strength further. Third, given that the variant strains have begun to emerge as monoinfections in India (supplemental Fig. 8), is an evolutionary divergence between the circulating and emerging viral strains a possibility in the coming years? Almost all the viral infections of the 4-B viral strains during the period of 2000 -2003 in India were found as coinfections with the 3-B viral strains. In 2010 -2011, however, a large number of the 4-B viral infections were found as monoinfections suggesting a progressive epidemiological segregation of the two viral variant strains (supplemental Fig.  8). Furthermore, an extensive sequence comparison of 200 and 10 full-length viral sequences of 3-and 4-B viral strains, respectively, of the South African origin (36) using the viral epidemiology signature pattern analysis tool software available at the HIV Sequence Database (www.hiv.lanl.gov) failed to identify the differential association of any amino acid or nucleotide residue with either of the B promoters (3versus 4-B) in any of the viral proteins, including Gag, Pol, Env, Tat, Rev, Vif, Vpr, Vpu, and Nef (data not shown). A similar analysis of 10 of each envelope sequences of 3-and 4-B viral strains from our own clinical cohort also failed to identify any preferential association of the residues with the B number. A large number of the 4-B viral infections in the early years were found invariably as a coinfection with the 3-B viral strains (supplemental Fig.  8). The enormous magnitude of the viral recombination in vivo may have obliterated any possible variations between the two variant viral strains. Now that the 4-B viral strains are found as a monoinfection in a significantly large number of contemporaneous samples, the manifestations of the divergent viral evolution may be identified in the coming years. Prospective observational studies will be required to find answers for many of these questions.
In conclusion, we demonstrate here that at least three different promoter variant strains of HIV-1 subtype C have been gradually expanding and replacing the standard subtype C viruses in India, and possibly in South Africa and other global regions, over the past decade. The new viral strains contain an additional NF-B, NF-B-like, or RBEIII site or a combination of the last two sites. Although the acquisition of an additional RBEIII site is a property shared by all the HIV-1 subtypes, duplication of an additional NF-B site remains an exclusive property of subtype C virus. The acquired B site is genetically distinct; it binds the p50-p65 heterodimer and strengthens the viral promoter at the levels of transcription initiation and elongation. The 4-B viruses dominate the 3-B isogenic viral strains in pairwise competition assays in T-cell lines, primary cells, and the EcoHIV mouse model. The dominance of the 4-B viral strains is also evident in the natural context when the subjects are coinfected with the B-variant viral strains. The mean plasma viral loads, but not CD4 counts, are significantly different in the 4-B infection suggesting that these newly emerging strains are probably more infectious. It is possible that higher plasma viral loads underlie selective transmission of the 4-B viral strains. Our current results propose that subtype C virus exploits a small window of opportunity to make a higher viral load probably without eliciting a higher magnitude of immune activation. Future studies are needed to further validate this idea.