DNA Sequence Alignment during Homologous Recombination*

Homologous recombination allows for the regulated exchange of genetic information between two different DNA molecules of identical or nearly identical sequence composition, and is a major pathway for the repair of double-stranded DNA breaks. A key facet of homologous recombination is the ability of recombination proteins to perfectly align the damaged DNA with homologous sequence located elsewhere in the genome. This reaction is referred to as the homology search and is akin to the target searches conducted by many different DNA-binding proteins. Here I briefly highlight early investigations into the homology search mechanism, and then describe more recent research. Based on these studies, I summarize a model that includes a combination of intersegmental transfer, short-distance one-dimensional sliding, and length-specific microhomology recognition to efficiently align DNA sequences during the homology search. I also suggest some future directions to help further our understanding of the homology search. Where appropriate, I direct the reader to other recent reviews describing various issues related to homologous recombination.

Homologous recombination allows for the regulated exchange of genetic information between two different DNA molecules of identical or nearly identical sequence composition, and is a major pathway for the repair of double-stranded DNA breaks. A key facet of homologous recombination is the ability of recombination proteins to perfectly align the damaged DNA with homologous sequence located elsewhere in the genome. This reaction is referred to as the homology search and is akin to the target searches conducted by many different DNA-binding proteins. Here I briefly highlight early investigations into the homology search mechanism, and then describe more recent research. Based on these studies, I summarize a model that includes a combination of intersegmental transfer, short-distance one-dimensional sliding, and length-specific microhomology recognition to efficiently align DNA sequences during the homology search. I also suggest some future directions to help further our understanding of the homology search. Where appropriate, I direct the reader to other recent reviews describing various issues related to homologous recombination.

Homologous Recombination
Homologous recombination enables the exchange of genetic information between different DNA molecules and is a major driving force in evolution. Homologous recombination contributes to double-strand DNA break (DSB) 2 repair, the rescue of stalled or collapsed replication forks, programmed and aberrant chromosomal rearrangements, horizontal gene transfer, and meiosis (1)(2)(3)(4)(5). The importance of homologous recombination is underscored by the findings that defects in key recombination proteins can result in a loss of genome integrity and lead to gross chromosomal rearrangements that are a hallmark of cancer. During recombination, a presynaptic ssDNA is paired with the complementary strand of a homologous dsDNA, resulting in the displacement of the noncomplementary strand from the duplex to generate a D-loop intermediate ( Fig. 1) (6 -8). This intermediate can be channeled through a number of alternative pathways, any of which can allow for the repair of the originally broken DNA molecule using information derived from the template (1,9,10).
Key reactions in homologous recombination are catalyzed by the Rad51/RecA family DNA recombinases, which are ATP-dependent proteins that form helical filaments on DNA (6 -8, 11). These recombinases are broadly conserved, and prominent family members include bacterial RecA, the archaeal protein RadA, and the eukaryotic recombinases Rad51 and Dmc1. RecA is the archetypal recombinase originally identified in genetic screens for Escherichia coli mutants defective in recombination by Clark and Margulies in 1965 (12). This discovery set the stage for years of investigation into the RecA protein, including its biochemical purification and characterization by the Radding, Howard-Flanders, Lehman, and Roberts laboratories (13)(14)(15)(16)(17)(18)(19)(20).
The importance of this early work on bacterial recombination was underscored by studies in 1992 from the Ogawa and Kleckner groups (21,22), which identified the first eukaryotic Rad51 and Dmc1 recombinases, respectively, with amino acid sequence homology to RecA. The identification of these eukaryotic recombinases provided the crucial confirmation that recombination proteins were conserved throughout biology, and enabled biochemical study of eukaryotic recombination. Subsequent work from the Sung and West laboratories (23,24) demonstrated that yeast and human Rad51 catalyze the same types of strand exchange reactions as RecA, providing a clear indication that the prokaryotic and eukaryotic recombinases promoted recombination through conserved mechanisms. Although Dmc1 proved more recalcitrant to biochemical analysis, this meiosis-specific recombinase was eventually shown to catalyze strand exchange (25). Interestingly, Dmc1 is thought to have diverged from Rad51 shortly after the emergence of eukaryotes (26,27), although the reason why most eukaryotes require a specialized meiotic recombinase for meiosis remains unclear (28,29).
Electron microscopy studies showed that RecA, Rad51, and Dmc1 all form extended polymers on DNA and stretch the bound DNA by ϳ50% relative to the contour length of normal B-form DNA (23, 30 -33). It was often assumed that the DNA was uniformly extended, but no one could obtain diffraction quality crystals of any recombinase bound to DNA. A major breakthrough came from the Pavletich group (34), which in 2008 reported crystal structures of RecA-ssDNA and RecA-dsDNA pre-and post-synaptic complexes, revealing that the DNA was organized into near B-form base triplets separated by ϳ8 Å between adjacent triplets (Fig. 1B). We refer to this unique DNA architecture as RS-DNA (Rad51/RecA stretched-DNA) to help distinguish it from other forms of mechanically stretched DNA (35).
We have learned much about the basic biochemical and biophysical features of homologous recombination since the identification of the first recA mutants in 1965. We know now what * This work was supported by National Institutes of Health Grant GM074739 and National Science Foundation (NSF) Grant MCB-1154511 (to E. C. G.) The author declares that he has no conflicts of interest with the contents of this article. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. 1  recombinases are involved, what types of reactions they catalyze, and the genetic consequence of defects in these proteins, and we even have atomic level details for some proteins and complexes. Despite our depth of understanding, many questions remain, and of central importance are questions regarding reaction dynamics and transient intermediates that are essential to any biochemical pathway.

The Homology Search
In 1979, Charles Radding wrote, "Nothing is more intriguing about homologous recombination than its beginning" (13). For our purposes, we will consider the "beginning" as what takes place during the alignment of DNA strands during the homology search. The remainder of this minireview will attempt to describe our current understanding of the homology search, and I also direct the reader to excellent reviews, each providing a different perspective on the topic (36 -39).
The homology search underlies all homologous recombination reactions, and its importance can be understood by recognizing that misalignment by even a single base pair can potentially render crucial genetic information inaccessible. The presynaptic complex can be considered a large site-specific DNA binding entity, whose specificity is defined by the sequence of the bound ssDNA. Therefore the principles derived from studies of the biological target search processes of simpler site-specific DNA-binding proteins also apply to the homology search. Our understanding of these types of search problems goes back to early experimental work of Riggs et al. (40), who in 1970 demonstrated that the lac repressor could locate target sites more quickly than expected based on random diffusion. This surprising observation prompted Peter von Hippel, Otto Berg, and colleagues (41, 42) to establish three models to explain how DNA-binding proteins might locate target sites. These hypothetical mechanisms include (i) hopping, where the protein moves via a series of microscopic dissociation and rebinding events; (ii) one-dimensional diffusion or sliding, which involves a random walk along the DNA without dissociation; and (iii) intersegmental transfer, involving movement from one site to another via a looped intermediate. Collectively referred to as facilitated diffusion, these models laid the groundwork for most subsequent target search studies.

Early Biochemical Insights
The availability of purified RecA led to rapid advances in our understanding of what reactions the protein could catalyze and insights into how these activities might be related to genetic recombination. As highlighted below, several early efforts were directed toward understanding the homology search.
Many of the earliest attempts to study the homology search were made by Radding and colleagues (43-46). Through a series of studies using plasmid-length substrates, they reported that strand exchange proceeded through an intermediate referred to as a "coaggregate," composed of a stable intermeshed network of RecA presynaptic complexes and dsDNA, and that the search itself involved facilitated diffusion within these coaggregates. The Lehman and Kowalczykowski laboratories (18,47,48) revealed that RecA does not require ATP hydrolysis to promote strand exchange, demonstrating that the search process must be driven by thermal energy. Camerini-Otero and colleagues (49) later found that RecA can form stable pairing interactions with homologous substrates as short as 8 bp, providing perhaps the first hint that RecA might employ a length-based recognition mechanism to help align homologous sequences (see below). Finally, the Adzuma laboratory (50) employed a novel approach based on tandem arrays of homologous bindings to ask whether one-dimensional sliding might contribute to the homology search. The premise of these assays was straightforward: if sliding contributed to the search, then  there should be preferential recognition of the outermost targets within a tandem array, whereas a search based upon random three-dimensional diffusion would allow efficient recognition for any site regardless of its position within the array. These experiments convincingly ruled out the involvement of extensive one-dimensional sliding during the homology search. In sum, these early investigations supported a search mechanism involving: (i) facilitated diffusion within large nucleoprotein networks; (ii) no need for ATP hydrolysis; (iii) the possibility of length-based sequence recognition; and (iv) the absence of long-distance one-dimensional sliding. Given these observations, and considering the polyvalent nature of the presynaptic complex, it seemed plausible that the homology search might take place via the original intersegmental transfer model of Berg and von Hippel (41, 42). However, definite proof of this mechanism would need to await development of new technologies capable of watching the search take place in real time.

Single-molecule Studies
Following the early biochemical work, further advances in our understanding of the homology search came much more slowly. This can perhaps be ascribed to two reasons: first, there were (and remain) many other exciting aspects of recombination that warranted investigation; and second, target search problems are very difficult to study because the relevant intermediates are both transient and heterogeneous. However, the advent of single-molecule approaches, which have the unique capacity to probe transient and heterogeneous intermediates, led to resurgent interest in the homology search, and the common theme among the studies described below is that they sought to observe search intermediates using optical microscopy.
One of the first single molecule studies of the homology search came from Forget and Kowalczykowski (51). These studies first sought to visualize fluorescently labeled RecA-ssDNA presynaptic complexes (430 or 1,762 nt in length) interacting with dsDNA ( phage DNA; 48,502 bp) tethered by either one or two ends to the surface of a sample chamber. Initial experiments revealed that homologous pairing interactions could only be detected with dsDNA molecules that were tethered by one end to the sample chamber surface (no homologous pairing interactions were found for extended dsDNAs tethered by both ends), suggesting that a successful homology search required randomly coiled dsDNA. This hypothesis was verified by elegant microscopy experiments that visualized RecA-ssDNA presynaptic complexes as they searched for homology within a dsDNA whose ends were held within a dualbeam optical trap, which was used to manipulate the extension of the dsDNA (Fig. 21A) (51, 52). In addition, this study identified the existence of intermediates involving multiple, simultaneous contacts between the presynaptic complex and the dsDNA via along-distance looping interactions. Taken together, these experiments indicated that the homology search required a collapsed dsDNA, allowing interactions to take place within three-dimensional space through a mechanism identical with the original intersegmental transfer model from von Hippel and colleagues (41, 42). This crucial finding remains perhaps the clearest experimental example of any biological target search based upon intersegmental transfer.
Shortly thereafter, the laboratory of Taekjip Ha (53) demonstrated that one-dimensional sliding over short distances may contribute to the alignment of DNA sequences by RecA. These experiments used single molecule fluorescence resonance energy transfer (smFRET) to show that short nonhomologous dsDNA fragments could slide along a RecA presynaptic filament ( Fig. 2B) (53, 54). The resulting data revealed a diffusion coefficient of ϳ0.9 ϫ 10 Ϫ3 m 2 /s, corresponding to a sliding distance of only ϳ60 -300 bp, which was in good agreement  with prior bulk biochemical results (50). These authors also used a clever assay with two short homologous sequences embedded within the same presynaptic ssDNA, allowing the dsDNA to be aligned into either of two positions. Remarkably, they detected two distinct FRET signals, and when these homologous sequences were between 5 and 7 nt, the dsDNA could rapidly oscillate between the two sites. Interestingly, when the length of homology was increased to 8 nt, the dsDNA could again align at either of the two sites, but could no longer oscillate between the sites. This finding was consistent with earlier bulk biochemical assays from the Camerini-Otero group (49) showing that 8 nt of homology was sufficient for stable dsDNA capture by the RecA presynaptic complex.
Most recently, my colleagues and I have used DNA curtains to visualize Saccharomyces cerevisiae Rad51, human Rad51, S. cerevisiae Dmc1, human Dmc1, and E. coli RecA presynaptic filaments as they sampled short (70-bp) fluorescently tagged dsDNA molecules with short tracts of sequence microhomology complementary to the presynaptic ssDNA ( Fig. 2C) (55-57). These studies provided further evidence that stable dsDNA binding required an 8-nt tract of microhomology, suggesting that this 8-nt requirement is a fundamental feature of the Rad51/RecA family of recombinases. This work also revealed that all five recombinases rapidly sample and reject sequences bearing fewer than 8 nt of microhomology. These intermediates exhibited power-law kinetics, implying the existence of a diverse ensemble of nonspecific binding states, rather than a single nonspecifically bound species, as the presynaptic complex interrogates different nonhomologous sequences. A key conclusion from this work, also predicted from several theoretical studies (see below), was that the Rad51/RecA family of recombinases reduces search complexity by confining stable search intermediates to defined length tracts of microhomology that have a high probability of being the "correct" homologous target (Fig. 3A).
These single molecule studies, together with the preceding bulk biochemical studies, help present a unified picture of the basic principles that guide sequence alignment during the early phases of genetic recombination involving a combination of (i) intersegmental transfer, (ii) short-distance one-dimensional sliding, and (iii) a reduction in search complexity allowing the presynaptic complex to kinetically ignore tracts of microhomology less than 8 nt in length (Fig. 3B).
These studies also point toward new avenues of investigation that should help expand our understanding of the homology. For instance, does ATP hydrolysis play any role at all in the homology search? What are the orientation and footprint of the dsDNA bound during each sliding event? How does the stiffness of the presynaptic complex, which has a persistence length of ϳ800 nm, impact its interactions with dsDNA during intersegmental transfer? How does chromosome structure impact the search, and how might the search be augmented by the presence of other accessory proteins that are necessary for recombination? What is the temporal relationship between DNA end processing, presynaptic filament assembly, and the homology search? Can the search take place concurrently with end processing (i.e. as the recombinase is being loaded onto the ssDNA), or are these truly separate stages of the reaction, with the search only taking place once end processing is completed? All of these questions might be accessible to single molecule approaches, or other emerging technologies.

Theoretical Considerations
Despite recent experimental advances, we still do not have a full grasp of the molecular details that make genetic recombination possible. For instance, we do not understand how recombinases access sequence information within a bound dsDNA molecule and then compare this information to the presynaptic ssDNA. Although base flipping is likely to be involved (58, 59), exactly how this might take place remains poorly understood. Again, our lack of understanding of this process can be traced to the transient and heterogeneous nature of the underlying intermediates, and the relevant intermediates cannot be accessed by typical experimental methodologies. For this reason, it will be crucial to integrate experi- Sequences Յ7 nt in length can be ignored during the homology search, whereas longer sequences, which are far less abundant, are more carefully scrutinized. Adapted with permission from Ref. 55. B, the search process involves multiple contact points between the presynaptic complex and the dsDNA molecule that is being interrogated for homology. The most stable contacts bear Ն8 nt of microhomology, allowing the presynaptic complex to probe flanking sequences for additional homology. Contacts with Յ7 nt of microhomology are rapidly released, ensuring that the search is focused only on sequences with a high probability of being the correct target. Each individual contact can slide short distances, and release of a single contact allows the presynaptic complex to sample a different region of the dsDNA for sequence homology. For clarity, the proteins and flanking dsDNA are omitted from the presynaptic complex. mental findings with detailed theoretical models that can help reveal facets of genetic recombination that are otherwise inaccessible. Here I very briefly touch upon a few examples of how theory and computational modeling have impacted our understanding of the homology search.
Several experimental studies have reported an 8-nt minimum length requirement for stable dsDNA binding by the presynaptic complex (49, 53, 55), raising the question of "Why eight?" The importance of length-dependent sequence recognition as a means for rapidly rejecting incorrect sequences was first recognized by Charles Thomas, Jr. (60), who in 1966 suggested that it would be beneficial for recombination to take place between nonrepetitive minimal recognition lengths composed of "words" that can be uniquely identified within the genome, and these original concepts have been further elaborated by several groups (55, [61][62][63]. Simply put, longer sequences are less common than shorter sequences, so there is a benefit to search for these longer sequences, which will also have a higher probability of being the correct homologous target (Fig. 3A). These simple mathematical considerations illustrate why minimal recognition lengths can be beneficial, but they do not address why 8 nt seems to be preferred.
A compelling physical explanation for this length-dependent recognition mechanism comes from the recent work of Prentiss and colleagues (39, 64). Using molecular dynamics simulations, these researchers identified an initial binding intermediate in which two adjacent bases, each from three successive triplets (i.e. eight contiguous bases in total) within a dsDNA molecule, had flipped open and paired with the presynaptic ssDNA (Fig.  4A). The third base of each triplet remained unpaired because a conserved DNA-binding loop within the RecA presynaptic complex (Loop 2) blocked formation of a complete triplet. This result provides crucial insights into the structure of the initial binding intermediate and also suggests an explanation for why 8 nt is the preferred minimal recognition length. In addition, this work indicates that the L2 loop must reorient to allow triplet formation. It is tempting to speculate that initial pairing with the correct 8-nt site will result in the reorientation of L2 and that this conformation transition may propagate along the interior of the presynaptic complex, ensuring that subsequent base triplet interactions can form unimpeded as longer regions of DNA are paired by the recombinases. Important insights also arise from considering how the extended structure of RS-DNA may impact the mechanism by which the presynaptic complex will interrogate sequence information buried within dsDNA. For instance, the modeling and experimental studies of Prentiss and colleagues (65,66) have revealed that upon binding to the presynaptic complex, the two strands of the incoming dsDNA will be stretched to differing extents: the noncomplementary strand will be stretched a bit more than the complementary strand. This unequal extension would help open the dsDNA to promote pairing interactions between the complementary strand and the presynaptic ssDNA.
Savir and Tlusty (67) have also examined the potential relationship between RS-DNA extension and pairing fidelity. Their work suggests that the deformation necessary to extend B-DNA to match the extension of RS-DNA may enhance the ability of the presynaptic complex to reject nonhomologous sequences during the homology search. They propose that the energetic penalty associated with dsDNA extension reduces the binding free energies of successive sets of base triplets, and that this deformation-dependent reduction in binding free energy is optimized against the gain in binding free energy for a correct triplet. In contrast, mispaired triplets are not sufficiently stable  to overcome the energetic penalty associated with dsDNA extension, allowing the presynaptic complex to quickly discriminate against nonhomologous triplets.
These few simple examples help illustrate how theory and modeling can provide crucial insights into recombination that cannot necessarily be accessed by experiment, and the continued combination of theory and experiment will be necessary for developing a deeper understanding of genetic recombination.

Homology Searches in Living Cells
Recent studies have begun probing how recombination takes place in vivo, and these experiments serve to underscore that regulatory cofactors, accessory proteins, chromatin, and chromosome organization must impact the homology search in ways that we do not yet fully understand. Several excellent reviews are available describing issues related to the in vivo homology search (36 -38), and here I briefly describe a few interesting observations that perhaps warrant further exploration.
Although nearly all in vitro studies of the homology search use just a single protein (i.e. RecA, Rad51, or Dmc1), it is almost certain that the presynaptic complex present in living cells will be bound by other recombination proteins. Therefore it will be essential to more fully define the length and protein composition of the "search entity" that exists within living cells. For instance, the eukaryotic protein Rad54, which is an ATP-dependent DNA translocase, is just one example of a protein known to interact with the presynaptic complex (68 -70). Rad54 and the closely related protein Rdh54 perform multiple functions during the different stages of recombination (68,71,72). Using time-resolved ChIP assays, the Jentsch laboratory (73) has recently provided evidence that Rad54 and Rdh54 are necessary for the homology search in S. cerevisiae. In the presence of Rad54/Rdh54, the Rad51 presynaptic complex sampled sites throughout a broken chromosome, but the homology search failed in the absence of the two translocases. One interpretation of these findings is that the motor activities of Rad54 and Rdh54 may promote ATP-driven unidirectional motion of the presynaptic complex along duplex regions of the genome while searching for homology (73). However, there may be alternative role(s) for Rad54/Rdh54 in these early stages of recombination: for instance, studies have implicated these proteins as contributing to the stability of the presynaptic complex (74), promoting nucleosome remodeling during recombination (75)(76)(77), and altering DNA topology to enhance strand unwinding (78 -81). Future work will be essential to fully understand how all the activities of these proteins contribute to recombination.
Interestingly, Rad54/Rad51-mediated search is accompanied by phosphorylation of histone H2AX (73,82). The simplest interpretation of these data is that the corresponding kinase might be an integral component of the presynaptic complex, but why it would phosphorylate sites of the chromosome that have already been searched (and rejected) remains unclear.
Recent studies have revealed that fluorescently tagged DSBs within S. cerevisiae can undergo highly dynamic diffusive motion, and this increased mobility is thought to reflect enhanced movement of the processed DNA ends as they explore the nuclear volume while looking for the homologous template ( Fig. 4B) (83)(84)(85)(86)(87). The local movement of the DSB is also accompanied by increased mobility of all the chromosomes throughout the nucleus. Interestingly, there is some evidence suggesting that the two ends generated by a single DSB appear to remain associated with one another during the search process, which may help coordinate the repair process by ensuring that both DSB ends locate the homologous target at the same time (86). DSB mobility is dependent upon both Rad51 and Rad54, suggesting that it reflects the movement of a fully assembled and actively searching presynaptic complex.
In addition to diffusive motion, the Greenberg group (88) has reported evidence for unidirectional motion of the presynaptic complex during Rad51-dependent recombination at telomeres in mammalian cells. Surprisingly, this movement occurred over distances ranging from ϳ1.3-5 m and was dependent upon the Hop2-Mnd1 protein, which is not normally considered to be involved in mitotic recombination, but is instead required for recombination during meiosis. All of these studies raise crucial and interesting questions with respect to what factors influence DSB mobility in living cells and how changes in mobility might be related to the homology search.
Eukaryotic chromosomes are not randomly organized, but instead appear to have preferred positions within the nucleus, and the existence of this higher-order spatial organization has important implications in replication, transcription, and recombination (89 -92). The Haber laboratory (93) has recently begun examining how the spatial organization of the S. cerevisiae genome might impact homologous recombination. For these experiments, they generated yeast strains bearing a single DSB that could be repaired using a template that was positioned at one of 24 distinct locations throughout the genome. This work revealed that repair efficiency was strongly influenced by spatial proximity: DSBs were more readily repaired when in close spatial proximity to the repair template, but more difficult to repair when further away from the template in three-dimensional space. These findings suggest that the homology search itself may be the rate-limiting step in homologous recombination and also raises interesting questions regarding how the physiological state of the cell might impact both chromosome organization and recombination, and whether cell-to-cell variability in chromosome organization could influence recombination, and if so, whether such differences might be heritable.
As a final note, the Sherratt laboratory (94) has shown that DSBs trigger formation of large RecA bundles in E. coli (Fig.  4D). The constituent proteins are not all bound to ssDNA, but instead form a scaffold along the inner membrane that guides the DSB ends toward the homologous locus through an ATP hydrolysis-dependent mechanism (94,95). These findings raise crucial questions for how the homology search might be taking place within the context of RecA bundles and to what extent the in vitro homology search reflects the in vivo search process. This finding also raises the question of whether similar protein scaffolds may be involved in eukaryotic mitotic recombination, meiotic recombination, or both. It will be crucial to recapitulate these fascinating RecA bundles in an in vitro setting to determine how they affect the homology search and subsequent strand exchange.

Conclusion
We are beginning to grasp the basic principles of the homology search, but the problem is far from solved, and future advances will likely require contributions from a number of distinct disciplines, including cell biology, biochemistry, biophysics, and molecular modeling approaches. Emergent technologies such as real-time super-resolution in vivo imaging (96) and high-resolution cryo-EM (97) also offer the potential for many new insights. How is pairing fidelity controlled, to what extent are degenerate sequences tolerated, and how might partner proteins affect these features? How does recombination intersect with replication and other repair pathways? What are the fundamental differences between mitotic and meiotic recombination, and how do these differences contribute to identification of the correct homologous template? Answers to some of these questions may be forthcoming, whereas others may prove more challenging. Of course, the other stages of recombination (e.g. end processing, strand invasion, choice of pathway, etc.) remain just as interesting as the homology search and are in many ways still just as mystifying. This abundance of open questions will help ensure that homologous recombination remains a fruitful area of scientific inquiry.