Introduction
DNA replication, transcription, recombination and repair all involve the generation of single-stranded DNA (ssDNA)
4The abbreviations used are: ssDNA
single-stranded DNA
nt
nucleotide(s)
smMT
single-molecule magnetic tweezers
KSHV
Kaposi's sarcoma-associated Herpesvirus
HR
homologous recombination
ALT
alternative lengthening of telomeres
G4-FID
G-quadruplex fluorescence intercalator displacement
TO
thiazole orange
SPM
superparamagnetic
FA
fluorescence area
knt
kilonucleotides.
segments (
1- Fujioka K.
- Aratani Y.
- Kusano K.
- Koyama H.
Targeted recombination with single-stranded DNA vectors in mammalian cells.
,
2Enzymatic mechanisms of DNA replication.
3Activating transcription from single stranded DNA.
). In some cases, such as resection of DNA ends during repair or creation of an uncoupled replication fork, these ssDNA segments potentially could span several hundred nucleotides or more in length (
4- Ma W.
- Westmoreland J.W.
- Resnick M.A.
Homologous recombination rescues ssDNA gaps generated by nucleotide excision repair and reduced translesion DNA synthesis in yeast G2 cells.
). Mixed sequence ssDNA segments are assumed to behave as random coils with some (weak) secondary structure caused by local base pairing (
5- Goddard N.L.
- Bonnet G.
- Krichevsky O.
- Libchaber A.
Sequence dependent rigidity of single stranded DNA.
,
6- Zhang Y.
- Zhou H.
- Ou-Yang Z.C.
Stretching single-stranded DNA: interplay of electrostatic, base-pairing, and base-pair stacking interactions.
). However, if the ssDNA is long and consists of arrays of nucleotide repeats that could fold into macroscopic structures this could profoundly affect subsequent events of DNA metabolism. One well-known example is the trinucleotide repeats linked to the triplet repeat diseases in which it is thought that as the replication fork passes these repeats, the ssDNA can fold into long hairpin structures and generate repeat expansions. The telomeric hexanucleotide repeat (5′-TTAGGG-3′) found in all mammals and many eukaryotes (
7Structure and function of telomeres.
) provides another example of a repeating sequence with the potential of folding into large macroscopic structures when present in a single-stranded state. The telomeric repeat is a member of a subset of repetitive DNAs found widely across nature containing runs of three or four contiguous guanine bases that as ssDNA have the ability to form G-quadruplexes (
8Formation of parallel four-stranded complexes by guanine-rich motifs in DNA and its implications for meiosis.
). Other examples include the replication origin of Kaposi’s sarcoma-associated
Herpesvirus (KSHV), which contains 26 and 22 short repeats each with three or four guanines each (
9- Nicholas J.
- Zong J.C.
- Alcendor D.J.
- Ciufo D.M.
- Poole L.J.
- Sarisky R.T.
- Chiou C.J.
- Zhang X.
- Wan X.
- Guo H.G.
- Reitz M.S.
- Hayward G.S.
Novel organizational features, captured cellular genes, and strain variability within the genome of KSHV/HHV8.
), as well as the C9orf72 gene locus, in which expansion of 5′-GGGGCC-3′ repeats from 2 to 24 in the genome to sometimes several thousand copies is the cause of the most common form of familial amyotropic lateral sclerosis and frontotemporal dementia (
10- Rademakers R.
- Neumann M.
- Mackenzie I.R.
Advances in understanding the molecular basis of frontotemporal dementia.
). G-quadruplexes form when runs of two or more guanine residues in a DNA stack on each other in a square planar structure stabilized by Hoogsteen base pairing (
8Formation of parallel four-stranded complexes by guanine-rich motifs in DNA and its implications for meiosis.
,
11- Gellert M.
- Lipsett M.N.
- Davies D.R.
Helix formation by guanylic acid.
,
12- Bochman M.L.
- Paeschke K.
- Zakian V.A.
DNA secondary structures: stability and function of G-quadruplex structures.
). They are further stabilized by certain cations, in particular K
+ (
13G-quadruplexes and their regulatory roles in biology.
,
14A sodium-potassium switch in the formation of four-stranded G4-DNA.
). The identification of telomeric G-quadruplexes
in vitro was first reported in 1989 (
15Telomeric DNA dimerizes by formation of guanine tetrads between hairpin loops.
,
16- Williamson J.R.
- Raghuraman M.K.
- Cech T.R.
Monovalent cation-induced structure of telomeric DNA: the G-quartet model.
). The existence of G-quadruplexes
in vivo was initially recognized by immunohistochemistry in the telomeric DNA of the ciliate
Stylonychia lemnae (
17- Paeschke K.
- Juranek S.
- Simonsson T.
- Hempel A.
- Rhodes D.
- Lipps H.J.
Telomerase recruitment by the telomere end binding protein-β facilitates G-quadruplex DNA unfolding in ciliates.
,
18- Schaffitzel C.
- Berger I.
- Postberg J.
- Hanes J.
- Lipps H.J.
- Plückthun A.
In vitro generated antibodies specific for telomeric guanine-quadruplex DNA react with Stylonychia lemnae macronuclei.
) and later throughout the human genome (
19- Biffi G.
- Tannahill D.
- McCafferty J.
- Balasubramanian S.
Quantitative visualization of DNA G-quadruplex structures in human cells.
). Only a portion of the human cellular G-quadruplex immunohistochemistry signal was detected at telomeres consistent with the presence of multiple and likely different repeat sequences throughout the genome capable of forming G quadruplexes (
19- Biffi G.
- Tannahill D.
- McCafferty J.
- Balasubramanian S.
Quantitative visualization of DNA G-quadruplex structures in human cells.
).
There has been growing interest in the potential role of G quadruplex formation at the ssDNA termini of mammalian and many other eukaryotic telomeres in which the G-rich strand extends beyond the dsDNA–ssDNA junction as a 3′-single-stranded overhang. In humans, this ssDNA extension is ∼150–300 nt in length (
7Structure and function of telomeres.
), and studies have pointed to the formation of G quadruplexes in the overhang and their role in inhibiting telomerase, as well as regulating binding and action by the shelterins (
20How shelterin protects mammalian telomeres.
).
A 24-nt ssDNA consisting of (3′-TTAGGG-5′)
4 has been crystalized as a model for the telomeric overhang. In the resulting structure, the DNA is arranged into a flat disk 0.6 nm in height and 4 nm on a side with the DNA adopting a parallel G-quadruplex conformation and with TTA triplets looping out in a propeller fashion (
21- Parkinson G.N.
- Lee M.P.
- Neidle S.
Crystal structure of parallel quadruplexes from human telomeric DNA.
). However, NMR studies and other solution approaches have argued that the 24-nt units exist in hybrid or anti-parallel conformations, suggesting that the parallel arrangement may be promoted by crystallization (
22- Li J.
- Correia J.J.
- Wang L.
- Trent J.O.
- Chaires J.B.
Not so crystal clear: the structure of the human telomere G-quadruplex in solution differs from that present in a crystal.
). Potential condensation of G-quadruplex folds into larger structures have been proposed but not experimentally demonstrated (
23Higher-order quadruplex structures.
).
Telomeric repeats also exist in extremely long nucleic acid chains in the form of G-rich transcripts of telomeres termed TERRA (5′-UUAGGG-3′)
n. In human cells, TERRA molecules of nearly 9000 nt in length have been observed (
24- Azzalin C.M.
- Reichenbach P.
- Khoriauli L.
- Giulotto E.
- Lingner J.
Telomeric repeat containing RNA and RNA surveillance factors at mammalian chromosome ends.
,
25- Schoeftner S.
- Blasco M.A.
Developmentally regulated transcription of mammalian telomeres by DNA-dependent RNA polymerase II.
). We previously demonstrated that TERRA is arranged into chains of 24-nt particles joined by 3-nt linkers and provided evidence that these particles are stabilized by G-quadruplex folds (
26Structure of long telomeric RNA transcripts: the G-rich RNA forms a compact repeating structure containing G-quartets.
). Thus, a 24-nt particle stabilized by G-quadruplex formation may be shared by both the telomeric G-rich ssDNA and RNA forms. Moreover, some TERRA RNA remains present in telomeric R-loops following transcription in many organisms (
27TERRA and the state of the telomere.
). Telomeric R-loops have been shown to promote homologous recombination (HR) and are prevalent in a significant number of cell lines and tumors exhibiting the alternative lengthening of telomeres (ALT) phenotype (
27TERRA and the state of the telomere.
).
It is important that we understand the kinds of macroscopic folding that ssDNA segments anywhere from 200 to 2000 nt or more in length can adopt at sites of transcription, recombination, or uncoupled replication forks. To do so requires generating very long ssDNA substrates so that any effect of DNA ends on the internal folding is minimized. In this study, we generated very long ssDNA (up to 20,000 nt) composed of the G-strand telomeric repeat (3′-TTAGGG-5′), the C-strand complement repeat (3′-AATCCC-5′), and a mutant G-strand repeat (3′-TTAGTG-5′). We then interrogated their structure using EM and single molecule magnetic tweezers (smMT) force extension analysis, two methods capable of providing structural information about very large nucleic acid molecules. We demonstrate that the G-strand telomeric ssDNA spontaneously condenses into a thick bead-like filament as imaged by EM and resolved by stepwise elongation following force extension analysis by smMT. Moreover, we show that switchable ATP-dependent RAD51 ssDNA binding could be used to probe the structure of these higher-order G-strand configurations.
Discussion
Extending the structural studies of short (∼20–150 nt) G-strand repeat ssDNAs to the analysis of much longer G-strand repeat ssDNAs has been hampered by the lack of well-defined substrates. To overcome this technical issue, we further developed a rolling circle replication system as previously described (
59- Brockman C.
- Kim S.J.
- Schroeder C.M.
Direct observation of single flexible polymers using single stranded DNA.
,
60- Lee K.S.
- Marciel A.B.
- Kozlov A.G.
- Schroeder C.M.
- Lohman T.M.
- Ha T.
Ultrafast redistribution of E. coli SSB along long single-stranded DNA via intersegment transfer.
) and adapted it for generating long ssDNA containing distinct repeating nucleotide sequences. This method allowed us to produce G-strand, C-strand, and mutated G-strand ssDNAs containing different forms of the human telomeric repeat sequence ranging in size from 100 to >20,000 nt (
Fig. S1).
G4-FID analysis confirmed the presence of G-quadruplex structures within the long human G-strand ssDNA (5′-TTAGGG-3′)n, but not in the C-strand ssDNA (5′-AATCCC-3′)n or a mutant G-strand ssDNA (5′-TTAGTG-3′)n that interrupts the contiguous G-triplet within the telomere repeat. Using EM we showed that very long human telomeric G-strand ssDNA spontaneously condenses into chains of large discrete bead-like particles with 5- and 8-nm diameters. These bead-like particles compact the ssDNA nearly 12-fold in length. Using size-selected DNA, we determined that the larger beads contain ∼240 nt and inferred that the smaller beads contained half this amount (∼120 nt).
Unlike EM, which requires a final dehydration step, smMT studies may be performed in solution at physiological ionic strength and with predominantly intracellular K
+ cation. Force extension analysis indicated several discrete G-strand ssDNA elongation step sizes. Because it requires extension forces >20 pN to unravel a folded 24-nt G-quadruplex (
47- de Messieres M.
- Chang J.C.
- Brawn-Cinani B.
- La Porta A.
Single-molecule study of G-quadruplex disruption using dynamic force spectroscopy.
,
48- Lynch S.
- Baker H.
- Byker S.G.
- Zhou D.
- Sinniah K.
Single molecule force spectroscopy on G-quadruplex DNA.
), we interpreted these results to indicate that the extension steps represented fundamental units of higher-order interaction structures. The weak positive correlation between step size and extension force (
Fig. S7) suggests that the higher-order structures are intrinsically more stable the larger they become. In EM fields of the compacted G-strand ssDNA, we noted some particles appeared less organized than others. This might suggest that in addition to the principal structures, other arrangements may be possible that might contribute to the force-extension “noise” that extends outside the baseline 10-nm resolution of the smMT instrument. Taken together, these observations represent the first physical evidence for intramolecular higher-order structures formed by G-strand ssDNA containing embedded G-quadruplexes. These findings appear broadly applicable for the long repetitive blocks of G-rich sequences present in many genomes including the human genome.
We calculated that the major tetramer and octamer higher order structures observed as initial step sizes by smMT would on average contain ∼120 nt (four G-quadruplexes plus four extra repeat loops) and ∼240 nt (eight G-quadruplexes and eight extra repeat loops), respectively. The amount of the octamer higher order structure DNA content is remarkably consistent with the calculated amount of ssDNA associated with the large EM beads (247 nt) and by extension the DNA content of the smaller beads that appear approximately half the size. These two initial smMT extension steps account for 63% of the total events, which appears consistent with the two major bead sizes observed by EM analysis.
We developed a model based on the stochastic formation of G-quadruplexes, followed by condensation of intervening G-repeats (
Fig. 5). An intermolecular side-by-side pairing of a G-repeat ssDNA, termed G-wires, has been described that utilized small oligonucleotides (5′-GGGGTTGGGG-3′) (
61G-wires: self-assembly of a telomeric oligonucleotide, d(GGGGTTGGGG), into large superstructures.
). In theory, G-wires might provide a plausible structure for the association(s) of intervening telomeric ssDNA repeats. However, we note that the intermolecular G-quadruplex folds predicted by G-wires should display comparable stability to force extension as
bona fide intramolecular G-quadruplexes. A less robust interaction between the intervening repeat G-residue pseudo-folds appears more consistent with our studies, which showed that both magnetic force and the weak ssDNA binding activity of RAD51 (−ATP) were capable of disrupting the higher-order structures. For example, we interpret the observation that RAD51 (−ATP) increasingly inhibits condensation following multiple release of magnetic force as consistent with increased shielding of intervening repeat ssDNA by weakly bound RAD51, supporting the intervening repeat G-residue pseudo-fold condensation model (
Fig. 5).
We performed extensive nuclease digestion analysis of the long G-strand ssDNA exploiting multiple nucleases with the goal of isolating higher order G-strand structures (not shown). However, under the best conditions, we observed a 24-nt band consistent with a protected G-quadruplex along with a smear of higher molecular weight ssDNA. We concluded that the higher-order G-strand structures are only partially resistant to nucleases, which also appears consistent with exposed features of intervening repeat G-residue interactions as proposed in our model (
Fig. 5).
We calculated the probability (
P) that a higher-order G-strand structure might condense within various lengths of human telomere 3′-ssDNA by assuming that a minimum of four individual ssDNA G-strand repeats separated by at least three G-quadruplex folds are required to form a stable particle (
Fig. S8e and “Experimental procedures”) (
). This analysis suggested that significant fractions of 100 nt (
p = 0.13) and 200 nt (
p = 0.32) human telomeric G-strand ssDNA may stochastic fold into G-quadruplexes leaving at least four intervening G-repeats, which could condense into a G-residue pseudo-fold. This probability dramatically increased with increasing ssDNA length (
Fig. S8e). Moreover, these estimates likely represent a minimum because we do not include the possibility that adjacent ssDNA repeats might nucleate into a G-residue pseudo-fold or consider any potential phasing by the telomere ssDNA–dsDNA junction. Taken as a whole, these studies suggest that in the absence of other factors, higher-order G-strand structures may form naturally on a 3′-ssDNA (
63- Makarov V.L.
- Hirose Y.
- Langmore J.P.
Long G tails at both ends of human chromosomes suggest a C strand degradation mechanism for telomere shortening.
,
64- Wright W.E.
- Tesmer V.M.
- Huffman K.E.
- Levene S.D.
- Shay J.W.
Normal human chromosomes have long G-rich telomeric overhangs at one end.
).
We consider the possibility that a higher order telomeric G-strand structure could serve as a molecular switch that might inhibit or facilitate the interactions of telomeric protein factors such as the shelterin protein complex or telomerase. Compaction into these large structures could aid in shielding the telomere from double-stranded break sensing and repair factors prior to binding by telomeric protein components (
65Complex interactions between the DNA-damage response and mammalian telomeres.
), or they might act as a roadblock for telomerase-mediated telomere elongation (
13G-quadruplexes and their regulatory roles in biology.
). Cech and co-workers (
66- Taylor D.J.
- Podell E.R.
- Taatjes D.J.
- Cech T.R.
Multiple POT1-TPP1 proteins coat and compact long telomeric single-stranded DNA.
) have examined a 144-nt (3′-TTAGGG-5′)
n ssDNA bound by hPOT1 and found highly compact DNA-protein particles, suggesting that POT1 is likely to disrupt any higher order G-strand condensation. Indeed POT1 and its partner TPP1 appear to overcome G-quadruplex roadblocks during telomere metabolic processes.
The self-condensation of long segments of ssDNA containing runs of Gs could have significant global consequences for replication, recombination, and repair. As noted above, if the G-rich strand at the C9orf72 locus (C2G4)
n assumed a higher order conformation, this could not only induce replication fork stalling but in itself might lead to further expansion by mechanisms related to replication restart, recombination or repair. Indeed, the expanded repeat has been shown to cause replication fork stalling
in vitro (
67DNA replication dynamics of the GGGGCC repeat of the C9orf72 gene.
). The KSHV replication origin contains ∼260 bp of G-rich repeats on one strand. Were this G-rich strand to condense into a higher order structure, this would likely generate a severe bend or knot at the KSHV replication origin with the C-rich strand present in a more open unstructured state. The net result would likely be a complex scaffold with very different protein binding properties than the linear dsDNA. It will be of interest to generate other long ssDNAs, including ones containing the G-rich C9orf72 repeat and the KSHV origin repeats and examine their architecture using the biophysical methods described here.
Experimental procedures
Rolling circle replication
Telomeric ssDNA circles were synthesized using oligonucleotides with either WT G-strand, mutant G-strand, or C-strand sequences (
Table S1) containing 20 telomeric repeats (IDT, Coralville, IA) as described earlier (
28- Kar A.
- Willcox S.
- Griffith J.D.
Transcription of telomeric DNA leads to high levels of homologous recombination and t-loops.
). In brief, oligonucleotides were ligated into circles using CircLigase
TM, and any remaining linear oligonucleotides were digested with ExoI and ExoIII (New England Biolabs Inc., Ipswich, MA) according to the manufacturer's recommendations (Epicenter Biotechnologies, Madison, WI). Circular products were confirmed by 7% denaturing PAGE.
For size selection of long G-strand DNA we used alkaline agarose gel electrophoresis. In brief, 0.8% alkaline agarose gels were prepared in alkaline buffer (10× buffer contains 10 n NaOH and 0.5 m EDTA, pH 8.0). The sample DNA was digested with S1 nuclease and mixed with alkaline gel loading buffer. After electrophoresis, the gel was soaked in neutralizing solution (containing 1 m Tris-HCl and 1.5 m NaCl), stained with SYBR® Green I nucleic acid gel stain (Invitrogen) followed by isolation of target DNA length using a QIAquick gel extraction kit (Qiagen).
G-quadruplex fluorescent intercalator displacement (G4-FID)
A constant temperature (20 °C) SPEX Fluorolog-3 (Horiba) spectrofluorometer with thermostatted cell holders (3 ml) was used to perform G4-FID studies in 10 mm lithium cacodylate buffer (pH 7.3) and 100 mm KCl. Briefly, 0.25 μm prefolded DNA in lithium cacodylate KCl buffer was mixed with TO (0.50 μm). Each ligand addition step (from 0.5 to 10 equivalents) was followed by a 3-min equilibration period, after which the fluorescence spectrum was recorded. The percentage of displacement was calculated from the fluorescence area (FA, 510–750 nm, λex = 501 nm), using the following equation: TO displacement (%) = 100 − [(FA/FA0) × 100], where FA0 is the fluorescence of TO bound to DNA without added ligand. The TO displacement (%) was plotted as a function of the concentration of added ligand.
Electron microscopy
DNA was adsorbed onto grid supports covered with a thin glow discharge-treated carbon film for 30 s to 1 min in the presence of the buffer present with the DNA or a buffer containing 2.5 m
m spermidine, 10 m
m Tris-HCl, 50 m
m KCl, 75 m
m NaCl, and 1 m
m MgCl (pH 7.5). The samples were washed in water followed by a series of ethanol dehydration steps, air-drying, and rotary shadow casting with tungsten at 1 × 10
6 torr (
68- Griffith J.D.
- Christiansen G.
Electron microscope visualization of chromatin and other DNA–protein complexes.
). The samples were visualized using a Tecnai 12 TEM (FEI Inc., Hillsboro, OR) at 40 kV, and the images were collected with a Gatan Orius charge-coupled device camera (Gatan Inc., Pleasanton, CA) with digital micrograph–supporting software (Gatan Inc., Pleasanton, CA). Dimensions of telomeric DNA filaments were measured from digital micrographs using digital micrograph (Gatan Inc., Pleasanton, CA) and ImageJ (National Institutes of Health, Bethesda, MD). Negative staining of the G-rich ssDNA bound by human RAD51 (purified as previously described (
56- Senavirathne G.
- Mahto S.K.
- Hanne J.
- O'Brian D.
- Fishel R.
Dynamic unwrapping of nucleosomes by HsRAD51 that includes sliding and rotational motion of histone octamers.
)) was carried out by incubating 50 ng of G-rich ssDNA with 1500 ng of RAD 51 in a buffer containing 20 m
m HEPES (pH 7.5), 10% glycerol, 0.5 m
m DTT, and 2 m
m MgCl
2 for 30 min at 37 °C in the presence or absence of 2.5 m
m ATP. Drops of the sample were adsorbed to glow charged thin carbon supports for 3 min followed by washed with 2% uranyl acetate, air drying, and imaging in a Tecnai 12 as above at 80 kV. Images for publication were arranged and contrast-optimized using Adobe Photoshop CS5 (Adobe Systems, San Jose, CA).
Magnetic tweezers preparation
Flow cells were engineered with glass cover slides affixed with double-sided tape to an aluminum foundation that maximized SPM bead imaging. Telomeric C-strand, G-strand or mutant G-strand ssDNA (30 pm final) was mixed with bead formation buffer (50 mm Tris-Cl, pH 7.6, 100 mm KCl). The combined sample was boiled for 10 min followed by snap chilling on ice for 1–2 h. Prior to attachment, the glass slides were treated with (3-aminopropyl) triethoxysilane followed by a 1:100 mixture of Biotin-PEG SVA to mPEG-SVA (Invitrogen). NeutrAvidin (500 μm; Invitrogen) was injected in the flow cell at a rate of 8 μl min−1, followed by the ssDNA. Tosyl activated M-280 SPM Dynabeads (ThermoFisher Scientific) were coated with anti-digoxigenin antibodies (Roche). Stock beads were removed and resuspended in a 0.1 m borate, pH 9.5 (buffer A). The beads were then resuspended in a mixture of buffer A and anti-digoxigenin at a 20 μg:1 mg, antibody:beads ratio. The beads were incubated for 12–17 h at 37 °C with slow tilt rotation. 3 m ammonium sulfate in buffer A (buffer B) was mixed into the beads. The beads were then resuspended in 10 mm Na/K-phosphate (pH 7.4), 140 mm NaCl/KCl (PBS) with 0.5% (w/v) acetylated BSA (Sigma) (buffer C). After incubation at 37 °C, 1 h, the coated beads were resuspended in PBS (pH 7.4) with 0.1% (w/v) acetylated BSA (buffer D). Buffer D was removed and then added to reach the desired bead concentration (2 × 109 beads ml−1).
Anti-digoxigenin–coated beads were mixed with running buffer (50 mm Tris-Cl, pH 7.6, 100 mm KCl, 200 μg μl−1 acetylated BSA, 0.0025% Tween 20; Amresco) and injected into the flow cell containing the ssDNA that was bound to the surface via a biotin–NeutrAvidin linkage; at 8 μl min−1 while agitating the system. The bound ssDNA was washed extensively with running buffer to remove free SPM beads prior to analysis. Force extension measurements used four 1-cm3 rare earth magnets (Neodynium, Magcraft). The SPM beads were imaged using a 530-nm LED lamp (Thorlabs) and a 100× Olympus oil immersion objective, and the images were collected on a 1024 × 1024 pixel charge-coupled device camera (Grasshopper Express 1.0 MP Mono FireWire 1394b) at a frame rate of 70 ms.
The human RAD51 protein was purified and stored as previously described (
52- Qureshi M.H.
- Ray S.
- Sewell A.L.
- Basu S.
- Balci H.
Replication protein A unfolds G-quadruplex structures with varying degrees of efficiency.
). Force extension analysis was performed in running buffer additionally containing 2 m
m MgCl
2, 100 μ
m DTT, and human RAD51 (500 n
m) with or without 1 m
m ATP (Roche).
Magnetic tweezers data analysis
SPM bead analysis was performed with the 3D bead tracking software Video Spot Tracker (Computer Integrated Systems for Microscopy and Manipulation, University of North Carolina-Chapel Hill). Displacement events were determined using the software Edge Detector (Computer Integrated Systems for Microscopy and Manipulation, University of North Carolina-Chapel Hill) and MATLAB and Statistics Toolbox Release 2014b (MathWorks, Inc., Natick, MA). Edge Detector was edited to avoid scoring negative displacement because these can bias the value of large positive displacement. The data were smoothed with the Savitzky–Golay filter
3 and an average window size of 20 points using Origin software (OriginLabs, Northhampton, MA). Distributions for displacement events were binned to the resolution size of step size (∼10 nm) and [size of bins (Max − Min/number of bins − 0.5)], where Max is the maximum value of events, Min is the minimum value of events, Bin Start is the Min; Bin End is the (bin size × number of bins). The events were fit to multiple Gaussian distributions to determine several different observed displacement changes. The coefficient of determination for different numbers of peaks was used to select the appropriate number of Gaussians to fit. All fittings were done with Origin software. Force and extension values were determined for various magnet positions above the flow cell surface. Force calculations were determined by measuring the bead’s fluctuations transverse to the direction of stretching and equating the fluctuation to the equipartition theorem (
43Smoothing and differentiation of data by least squares procedures.
). Double-stranded and mutant G-strand DNA curves were fit to the freely jointed chain model for DNA nonlinear elasticity (
56- Senavirathne G.
- Mahto S.K.
- Hanne J.
- O'Brian D.
- Fishel R.
Dynamic unwrapping of nucleosomes by HsRAD51 that includes sliding and rotational motion of histone octamers.
).
Simulation of multiple quadruplex formation
Define an n-dimensional array ā, composed of ai, i = 1, …, n where ai = 0 for all i.
Define Algorithm A.
Create
n random numbers in an
n-dimensional array
ū, composed of
ui,
i = 1, …,
n, where
ui = ∼
iid U([0,1]). Sort the array
S(
ū) to create ordered uniformly random numbers
ui,
i = 1, …,
n, where
ui is the
ith sorted value of
ū. Then let
I(
ui) be the original index of the
ith sorted value of
ū.
(1)
Algorithm A results in ā having a series or set of blocks B, where b Ε B = {[I(ui), I(ui) + 1, I(ui) + 2, I(ui) + 3]}, for all i = 1, …, n satisfying Σj=03a(ui+j) = 0 after potentially reassigning values for ai value through (i − 1) previous iterations. Note that this implies a(I(ui)+j) = 1 for j = 0, 1, 2, and 3 for each element b Ε B after algorithm A. That is, each element b of B is a set or block of four consecutive indices denoting consecutive positions in the array ā each having a value of 1 with gaps between blocks having size 0, 1, 2, or 3, which have consecutive values of 0.
Let gk be the size of the kth “gap” between the block bk and bk+1, gk Ε {0,1,2,3} for all k = 1, …, m, where m = ∼M is a random number representing the total number of gaps that depends on algorithm A and each iteration of algorithm A. The distribution of M over iterations of algorithm A is potentially obscure and may not have a simple representation. The frequency of each gap length within an iteration of algorithm A, i.e. N(gk = 0) = N0, N(gk = 1) = N1, N(gk = 2) = N2, N(gk = 3) = N3, are also random numbers from different distributions that are dependent on n. By definition, N0 + N1 + N2 + N3 = m.
Calculating the probability of higher-order G-strand structure formation
A solution to the likelihood that a human telomere G-strand repeat sequence (TTAGGGn) supports an environment compatible with higher-order structure formation begins by defining a condition where four consecutive repeats (TTAGGG4) fold into a G-quadruplex, followed by 1–3 repeats (TTAGGG1–3; termed a “gap”). This chain of G-strand repeats can be thought of as a “unit” containing a G-quadruplex fold with adjacent gap, surrounded by G-quadruplex folds containing no gaps.
Structural analysis has suggested that three gaps with an adjacent G-guadruplex (3 units) plus one additional gap of 1–3 G-strand repeats are minimally required to form a higher-order structure. Thus, we can reframe the probability of a nucleotide sequence having conditions compatible with higher-order structure formation as the probability that at least four successive gaps will occur within the G-strand sequence. If we term developing a single gap as a “success,” then Feller (
) described a simple method for determining the probability of at least r consecutive successes (gaps) in n Bernoulli trials (total gaps),
(2)
where
p is the probability of success of a single gap,
q = 1 −
p, and
x is the real root,
which cannot equal 1/
p.
In our case, the success probability (
p) depends on total probability of 1–3 repeats occurring in a defined DNA length and can be empirically determined from simulated quadruplex formation. As an example, for 120 nt of 3′-ssDNA: total probability = 0.28 (1 repeat) + 0.23 (2 repeats) + 0.14 (3 repeats) = 0.65. From simulation, we can also calculate
n total gaps in
m 3′-ssDNA nucleotides from the average gap size (Ave) expanded to the unit size.
(4)
where
m/6 is the maximum number of G-repeats in m nucleotides of 3′-ssDNA. As an example, for 120 nt of 3′-ssDNA containing a hexameric repeat (TTAGGG): (120 ÷ 6) × 1/[4 (repeats per G-quadruplex) + 1.15 (average repeats between G-quadruplexes)] = 3.88. For simplicity, 3.88 is rounded to the nearest integer 4, effectively making the resulting probability (
P) calculation a minimum.
From
Equation 1, we can then determine the probability of at least four consecutive gaps in
m nucleotides. As an example, for 120 nt,
q ×
p4 = 0.35 × (0.65)
4 = 0.06;
x = 1.10, leading to a probability
P(
r = 4,
n = ∼4) = 0.16. These results suggest that within a population of 120 nt, 3′-ssDNA 16% will stochastically condense into a higher order G-strand structures. As expected,
Equation 1 becomes more accurate as
n increase. The probability of higher-order G-strand structure
versus 3′-ssDNA length is plotted.
Author contributions
A. K., N. Ö. A., R. F., and J. D. G. conceptualization; A. K. and J. D. G. resources; A. K., N. J., N. Ö. A., R. F., and J. D. G. formal analysis; A. K., R. F., and J. D. G. supervision; A. K., R. F., and J. D. G. funding acquisition; A. K., N. J., N. Ö. A., R. F., and J. D. G. validation; A. K., N. J., N. Ö. A., R. F., and J. D. G. investigation; A. K., N. J., N. Ö. A., R. F., and J. D. G. visualization; A. K., N. J., N. Ö. A., R. F., and J. D. G. methodology; A. K., N. J., N. Ö. A., R. F., and J. D. G. writing-original draft; A. K. and J. D. G. project administration; A. K., N. J., N. Ö. A., R. F., and J. D. G. writing-review and editing; N. J. and N. Ö. A. data curation; N. J. and N. Ö. A. software.
Article info
Publication history
Published online: April 19, 2018
Received in revised form:
April 4,
2018
Received:
January 29,
2018
Edited by Patrick Sung
Footnotes
This work was supported by National Institutes of Health Grants GM31819 and ES013773 (to J. D. G.) and GM080176 (to R. F.). The authors declare that they have no conflicts of interest with the contents of this article. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
This article contains Table S1, Figs. S1–S9, and Notes S1 and S2.
Copyright
© 2018 Kar et al.