![]()
|
|
||||||||
From the The L1 element (LINE-1, long interspersed repeated DNA) is the
mammalian version of the non-long terminal repeat class of transposable
elements that replicate via an RNA intermediate
(retrotransposons)(1) . Every modern mammalian species studied
to date contains a distinctive L1 family consisting of tens of
thousands of members, which are interspersed throughout the genome.
Despite their distinctiveness, all full-length mammalian L1 elements
share the same organization: a 5`-UTR, (
Figure 1:
The L1
retrotransposable element. Generic mammalian L1 element. reg and G-rich Pu:Py sequence denote regulatory sequence and
a guanine-rich polypurine:polypyrimidine sequence, respectively. See
text for more details.
Each of the modern L1 families
evolved independently in the various mammalian lineages from a common
ancestral L1 element that dates back to sometime before the mammalian
radiation In spite of their prominence, most of the
biochemical and molecular details of L1 regulation, replication, and
transposition remain unknown. To a large extent, what is known has been
derived from evolutionary studies, and these have yielded two kinds of
information. The first is derived from comparisons between different
mammalian L1 families or between L1 elements and their counterparts in
other organisms. This comparative biochemical approach identified and
assigned possible functional significance to different features of
non-long terminal repeat retrotransposons. The second type of
information, generated by the analytical techniques of evolutionary
biology, revealed the evolutionary dynamics of L1 families. These
studies suggest that L1 evolution is a paradigm for a novel, but as yet
incompletely understood, evolutionary process that is taking place
within the ``ecosystem'' of the mammalian genome and that L1
evolution is quite dynamic, with novel L1 variants continually emerging
over relatively short periods of time. As a consequence, L1 evolution
has generated a rather complex family structure, and it has become
apparent that this feature of L1 evolution can be exploited to examine
the evolutionary (phylogenetic) history of the mammalian hosts that
harbor these
elements(12, 13, 14, 15, 16) .
It is this last aspect of L1 biology that will be the focus of this
review. By way of introduction, we will briefly summarize some results
derived from the comparative biochemical analysis and the evolutionary
studies of L1 families.
Comparative Biochemistry of L1 Elements Evolutionary comparisons have shown that the L1 RT is
seemingly of very ancient lineage since transposable elements encoding
an homologous protein have been found in bacteria, Group II introns,
plants, fungi, and invertebrates(1) . Elegant biochemical
studies on the L1-like RTs from invertebrates including insects, fungi,
some Group II introns, and bacteria revealed several intriguing
mechanistic properties of this class of RT, which may bear directly on
the biochemical properties of the L1 RT. Although this is the subject
of a recent review(17) , two properties of the RT are worth
mentioning here. First, efficient cDNA synthesis by the RT depends on
recognition of a structural feature near the 3`-end of the transposon
transcript(10, 18, 19, 20, 21) .
Second, the RT of the L1-like R2Bm element of Bombyx mori tends to incorporate non-templated bases (mainly, but not only,
As) at the 3`-end of the transposed cDNA(21) . These properties
could explain two evolutionarily conserved features of the mammalian L1
3`-UTR. The first is a G-rich polypurine stretch, which can form
various unusual folded structures whether present as DNA (22, 23, 24) or as RNA. ( One of the more striking
findings revealed by the comparisons of different mammalian L1 families
is that, in contrast to the rest of the element, the 5`-UTRs of even
very closely related L1 families are not
homologous(29, 30, 31, 32, 33) .
This indicates that the evolutionary origin of the 5`-UTR region is
independent of the rest of the L1 element and that novel 5`-UTRs have
been repeatedly acquired by the various mammalian L1 families. Since
the 5`-UTR includes a region that has regulatory
properties(34, 35, 36, 37, 38) ,
the repeated acquisition of a novel regulatory sequence could be a
means whereby the element bypasses either inactivating mutations in the
L1 element (38) or a host-encoded repressive mechanism. Either
explanation is consistent with the fact that sense strand-specific L1
transcripts are produced mainly from the most recently evolved L1
elements(39, 40) . Although the evolutionary source
for the novel L1 regulatory sequences is not known, they share certain
sequence features with viral and housekeeping promoters in that they
are CpG islands (41, 42) and lack many of the
traditional transcription factor binding motifs found in RNA polymerase
II promoters (e.g. TATA and CAAAT boxes).
The Evolutionary Dynamics of L1 Families L1 replication generates two types of progeny:
replication-competent copies and, in far greater numbers, defective
copies, e.g. 5`-truncated, rearranged,
etc.(25, 26, 29) . For the most part, these
defective copies were neither excised (4, 5, 11, 12) nor homogenized by
postreplicative events such as gene conversion(11, 12, 43, 44, 45, 46) but
have diverged from each other due to the accumulation of random
mutations over time. Therefore, the extent of divergence between
members of any particular family serves as a built-in
``carbon'' dating mechanism whereby the time of amplification
can be estimated, i.e. the more divergent the family, the
older it is. Among the replication-competent copies, novel variants
were also produced, and these in turn generated both defective and yet
newer versions of non-defective
elements(11, 47, 48, 49) . Variant
elements can rapidly succeed each other (31, 32, 50) and also
co-exist(6, 11, 15, 49, 51) ,
perhaps competing with each other (46) . (
Using L1 DNA as a Phylogenetic Character Establishing a correct phylogeny, i.e. the unique
tree that describes the genealogy of the taxa in question, is essential
if either studies on evolutionary processes or comparative biochemical
studies are to be meaningful. However, determining the correct
phylogenetic tree can be extremely difficult (e.g. see (52, 53, 54, 55, 56) ).
Taxa are grouped on the basis of shared characters, and sometimes it is
impossible to determine whether a shared character has been inherited
from a common ancestor or whether it arose independently due to
convergence, parallelisms, or reversion to an ancestral state.
Non-inherited shared characters are called homoplasies, and they can
lead to multiple, equally likely phylogenetic trees or, in extreme
cases, a single incorrect tree. A lucid elaboration of the difficulties
caused by homoplasy can be found in (55) . An additional
problem encountered in phylogenetic analysis is determining whether a
shared character has been recently acquired (derived) or is an
ancestral (primitive) one that was retained by the modern taxa. This
becomes a problem if different taxa have undergone different rates of
evolution. For example, when species that share a common ancestor
evolve at different rates, then the slower evolving ones will retain
more of the ancestral characters than the faster evolving ones, and the
slower and faster evolving species could be grouped separately even
though they share a common ancestor. If we consider the presence or
absence of an amplified L1 clade (i.e. family or
subfamily
Examples of Using L1 as a Phylogenetic Character The use of L1 DNA as a phylogenetic character is relatively
simple in both principle and practice and depends on obtaining enough
DNA sequence information to prepare clade-specific hybridization
probes. Although probes cognate to any region of L1 DNA can be used (e.g. see below and the legend to Fig. 2), those
specific to the 3`-UTR are most generally useful, especially for
recently evolved clades. This is because the 3`-UTR evolves for the
most part more rapidly than most of ORF I and all of ORF II (e.g. Refs. 5, 12, 13) and is not replaced wholesale during evolution as
can be the case for the 5`-UTR (see ``Comparative Biochemistry of
L1 Elements''). In spite of the relatively rapid evolutionary
change in the 3`-UTR, clades that are as old as 12-15 million
years can be readily distinguished (see below).
Figure 2:
Distribution of L1 clades in various
rodents. A, diagrammatic representation of the presence or
absence of an ancient murine L1 clade, Lx, and several modern rat
clades, L1
For older L1 clades,
we have found probes of Hybridizations are most conveniently carried out using dot blots of
genomic DNA. However, hybridization to blots of electrophoretically
separated fragments of genomic DNA that had been digested with
restriction endonucleases, which recognize conserved sites within the
3`-UTR, greatly increases both the specificity and sensitivity of the
method. The appearance of novel restriction fragments is indicative of
subdivisions within a given clade due to the loss or gain of a
particular restriction enzyme site. Therefore, a shared novel
restriction fragment detected even by a probe specific for just a
single base difference would be highly specific for a given clade. This
is because the presence of the novel restriction fragment would have
required at least two base changes: the one detected by the
oligonucleotide and the one that created or destroyed a given
restriction enzyme site. The sensitivity of the method is increased
because the presence of subdivisions within a given clade could be
evidence of recently evolved (or evolving) L1 clades. In the two
sections below we demonstrate the use of L1 as a phylogenetic character
to examine an evolutionary event that occurred about 12 Ma and one that
began 1-3 Ma.
Phylogenetic Analysis Using an Ancient Murine L1 Clade Murinae, a rodent subfamily, which includes Old World rats (Rattus) and mice (Mus) and many other genera, first
appeared 12-15 Ma. The classification of Murinae is traditionally
based on several cranial and dental characters (57) and in a
number of cases has been problematic(58) . A few years ago we
discovered the relics of an ancient L1 clade (referred to as Lx) in the
genomes of mice and rats(11, 12, 15) . Based
on the extent of nucleotide divergence between Lx members and the
murine neutral nucleotide substitution rate, we estimated that the Lx
amplification coincided with the murine radiation(15) .
Therefore, we expected that the relic copies of Lx would be present in
all modern day murines but absent from non-murine taxa. We found Lx
to be present in 24 unambiguously classified murine species and absent
from 13 unambiguously classified non-murine
species(11, 15) . Of particular interest was our
finding that the Lx amplification was absent from three taxa, Lophuromys, Uranomys, and Acomys, that were
traditionally classified as murines (58) . Our data suggested
that the classification of these species was incorrect, and indeed
their inclusion in Murinae has at times been challenged (e.g. see (59) and references therein). Subsequent
re-examination of the morphological data and both single copy DNA
hybridization data (59) and 12 S mitochondrial rRNA sequence
analysis (60) have now further supported the exclusion of these
taxa from Murinae. Therefore, the murine-like dental pattern of the (Lophuromys, Uranomys, Acomys) clade, which
in part formed the basis of their classification as murines, is quite
likely a homoplasy due to convergence. The above results indicated
that the Lx amplification is an acquired taxon-defining character, or
synapomorphy, for the subfamily Murinae. We further tested this
supposition by re-examining the classification of Otomys. The
animals in this genus, commonly called African vlei rats, were
traditionally classified in their own subfamily, Otomyinae, of equal
rank to Murinae(58) . However, this classification did not
accommodate the presence of a transitional fossil form between an
ancestral murine species and present day Otomys. This fossil
of the now extinct Euryotomys was dated from 6.0 to 4.5
Ma(61) , well after the murine radiation and its existence
suggested that the Otomyinae were murines. If true, then the Otomyinae
species should contain Lx DNA, and this turned out to be the
case(16) . Recent single copy DNA hybridization data (62) also support the reclassification of these animals as
murines. Therefore, using the absence or presence of Lx DNA as a
phylogenetic character helped resolve two problems in rodent phylogeny.
The distribution of Lx in murine and non-murine species is summarized
in Fig. 2.
Phylogenetic Analysis with Modern L1 Clades The distribution of recently amplified L1 clades can be used
to resolve the taxonomy of more recently diverged animals. The genus Rattus contains about 50 species considered to be Rattus sensu strictu. Single copy DNA hybridization is unable to
establish a branching pattern for many of these species, and the
systematics of this group remains largely
unresolved(16, 58) . We can distinguish at least five
relatively modern L1 clades in Rattus norvegicus.( By
contrast, two younger rat L1 clades, L1 The L1 Studies on L1 DNA of Mus have
revealed a similar picture of L1 evolution and have demonstrated the
usefulness of L1 DNA as a phylogenetic character in this taxon.
Species-specific L1 clades distinguish Mus domestics and Mus spretus(13) and have been used to detect M.
spretus genomic sequences present in an inbred strain of Mus
musculus(63) . Additionally, recent work on modern M.
spretus L1 DNA has revealed emerging and apparently competing L1
clades that may be useful in defining subpopulations of this species as
well(46, 49) . Humans also contain a very complex L1
DNA composition (5) including a number of distinct
replication-competent L1 clades(6, 51) . As a consequence of their long replicative history in
mammalian genomes, L1 elements have generated a rich collection of DNA
``fossils'' that can be used to determine the phylogenetic
history of mammals. Here we have shown how the presence (or absence) of
an amplified L1 clade can be used as a novel and robust phylogenetic
character. We should also mention that individual transposition events
can be used for phylogenetic analysis. Batzer et al.(64) showed that the frequency of a SINE insertion at four
different loci in the human genome distinguished human population
groups and used their results to further support the African origin of
modern humans. Comparisons between mammalian Finally, we would like to close with a
comment about the possible effect of L1 transposition on mammalian
evolution. Because L1 insertions are random and potentially either
beneficial or deleterious, it is easy to visualize how an L1
amplification event introduces genetic diversity into an extant animal
population. Depending on a number of extrinsic (e.g. geographical isolation, population size) and intrinsic (e.g. changes in fitness caused by an L1-induced genetic effect)
factors, a given animal population could become differentiated into
subpopulations as a consequence of the difference between their pattern
of L1 insertions. Moreover, depending on the rate at which novel L1
clades emerge and amplify, it would be quite possible that
subpopulations could also differ by their content of distinct L1
clades, which, depending on the relative transposition rate of the
distinct L1 clades, further enhance the generation of genetic diversity
within the taxon. To the extent that genetic diversity predisposes a
given taxon to speciation, one might entertain the notion that L1
amplification events may have a role in mammalian speciation. In this
regard, we note the apparent correlation, at least during rodent
evolution, between the generation and expansion of novel L1 clades and
a number of speciation/extinction events (see (15) and
references therein.
Volume 270,
Number 43,
Issue of October 27, 1995 pp. 25301-25304
©1995 by The American Society for Biochemistry and Molecular Biology, Inc.
Fossils
and
Phylogenetic Analysis
USING L1 (LINE-1, LONG INTERSPERSED REPEATED) DNA TO DETERMINE THE
EVOLUTIONARY HISTORY OF MAMMALS (*)
INTRODUCTION
Comparative Biochemistry of L1 Elements
The Evolutionary Dynamics of L1 Families
Using L1 DNA as a Phylogenetic Character
Examples of Using L1 as a Phylogenetic Character
Phylogenetic Analysis Using an Ancient Murine L1 Clade
Phylogenetic Analysis with Modern L1 Clades
Concluding Remarks
FOOTNOTES
ACKNOWLEDGEMENTS
REFERENCES
)which includes a
regulatory sequence; ORF I, which encodes a protein of unknown
function; ORF II, which encodes an RT(2) ; and a 3`-UTR that
contains a G-rich polypurine:polypyrimidine tract and terminates in an
A-rich sequence (Fig. 1).
100 million years ago (3, 4, 5) . Being capable of prodigious
amplification, the modern L1 elements and their evolutionary
antecedents (see below) now account for at least 30% of the mass of
mammalian DNA. In addition, L1 elements are active in present day
species and are a frequent cause of genetic polymorphisms including a
number of non-inherited genetic defects in
humans(6, 7, 8) . It is also possible that
the L1 RT catalyzed the retrotransposition of elements that do not
encode their own RT such as the mammalian SINE families (e.g. Alu in primates, B1, B2, ID, etc., in
rodents)(5, 9, 10, 11) . Since these
families can reach copy numbers as high as 1
10
and
alone contribute up to 5% of mammalian DNA (e.g. Alu(9) ), L1 elements quite likely have had, and continue
to have, a profound effect on the structure, function, and evolution of
mammalian genomes.
)In the
latter case such structures could possibly act as a recognition site
for the L1 RT. The second is the A-rich terminus of L1 elements. While
originally thought to have originated as the poly(A) tail of the
retrotranscribed L1 transcript(25, 26) , the A-rich
terminus could have been generated during the retrotransposition
process, as has been found for the R2Bm element(21) . Such a
mechanism could account for the fact that even recently transposed L1
elements do not always terminate in a pure poly(A) sequence (e.g. see (27) and (28) ).
)Therefore, a given L1 ``family'' consists of
several closely related L1 subfamilies. Since L1 elements are
transmitted only by inheritance (i.e. vertically)(3, 13, 29, 30, 31, 46) ,
the L1 DNA composition of each species is unique. Thus, taken in
toto, the L1 content of present day mammalian species is very
complex encompassing as it does the entire evolutionary history of the
modern L1 elements since their descent from the common mammalian
ancestral L1 element(4, 5, 12) . (
)
) as a phylogenetic character, the multicopy state
of the ``L1 character'' renders the issue of homoplasy moot.
Since the relics of a given L1 amplification event share multiple
diagnostic nucleotides, the presence of the same L1 clade in different
taxa could not have occurred by convergent evolution but must be a
shared derived character (referred to as a synapomorphy). Since L1
relics are retained in the genome in high copy number, reversion to the
ancestral state, i.e. the absence of a particular L1 family in
a particular taxon, cannot occur. In addition, the relative
``ages'' (extent of sequence divergence) of L1 clades are
easily determined. Therefore, the problems of both homoplasy and of
whether a character is a retained primitive or a newly acquired one are
circumvented when L1 DNA is used as a phylogenetic character.
, L1
, and L1. The
original data were presented in Refs. 12, 15, and 16. B,
distribution of two newly evolved clades of L1
:
L1
and L1
. These families were
distinguished on the basis of differences between a hypervariable
region that we recently discovered in ORF I. The darkness of
the grayfilledcircles is related to the
amount of the indicated subfamilies in R. norvegicus and R. rattus moluccarius, where the sum of the rn and mol clades
in R. norvegicus is about the same as the mol clade in R.
rattus moluccarius (E. Cabot, B. Angeletti, B. Hayward, K. Usdin,
and A. V. Furano, manuscript in
preparation).
200 base pairs to be both specific yet
long enough to hybridize efficiently to the divergent members of a
given clade. For the younger families oligonucleotide probes are
essential. Oligonucleotide probes of
20 bases cognate to regions
of clades that differ by 2 or more diagnostic nucleotides are ideal. In
cases where the multiple diagnostic base differences between clades are
further apart than can be accommodated on a single oligonucleotide more
than one oligonucleotide should be used to eliminate the possibility
that the shared hybridization signal is due to chance mutation in
precisely the same position in two otherwise different clades (but see
below). We have obtained excellent discrimination using
oligonucleotides to probe for a single base difference as long as the
difference resides in the middle of the oligonucleotide and the
hybridization is carried out in the presence of a large excess of the
appropriate competitor oligonucleotide, i.e. one that has the
same sequence as the probe except for the distinguishing base change.
)One of the older ones, L1
, amplified
about 3.5 million years ago when the species comprising Rattus sensu strictu began emerging. As Fig. 2illustrates, the
L1
clade is present only in animals classified as Rattus sensu strictu(16) . Therefore, the L1
clade probably arose in the common ancestor of Rattus sensu strictu some time after the divergence of these animals from
the ancestor they shared with Rattus sensu lato.
and
L1, are present only in R. norvegicus and in
animals identified as Rattus rattus moluccarius, a presumed
subspecies of Rattus rattus(16) . Although R.
rattus moluccarius specimens contained both the L1
and
L1 clades, these L1 clades were absent from a number of
other R. rattus specimens (Fig. 2). This result was
quite surprising and suggested that the R. rattus moluccarius specimens were misclassified and represent a sister taxon of R. norvegicus rather than a subspecies of R.
rattus(16) . Further analysis using mitochondrial DNA
sequences and our finding that R. norvegicus and R. rattus
moluccarius share a satellite DNA sequence supported this
conclusion(15) . Therefore, the L1
and L1 clades are markers for a new taxon within Rattus sensu
strictu; this taxon contains R. norvegicus and R. rattus
moluccarius.
clade has evolved rapidly,
and two descendant clades of L1
can be distinguished:
L1
and L1
. While R. rattus moluccarius contains only the L1
clade, R. norvegicus contains some members of this
clade but far greater numbers of the L1
clade (Fig. 2B). This indicates that the L1
clade either arose in or began amplifying in R.
norvegicus soon after it and R. rattus moluccarius diverged from their common ancestor. Furthermore, it is possible
that the L1
clade may have expanded at the
expense of the L1
clade in the R.
norvegicus genome since this clade has not amplified to the same
extent as the L1
clade in R. norvegicus or as the L1
clade in R. rattus
moluccarius. (
)These results suggest that very closely
related L1 clades can exclude each other perhaps by competing for
limiting host factors.
-globin loci have
shown that different species can be distinguished by the pattern of L1
insertions at this site(4, 65, 66) . For
example, an ancient L1 insertion between the and
genes
distinguishes eutherians (mammals) from metatherians
(marsupials)(4, 66) , and two independent L1
insertions flank the
-globin gene in simians but not in
prosimians(66) . However, independent insertional events could
be problematic for phylogenetic analysis. First, they are much harder
to identify or characterize initially (though once detected, relatively
easy to screen for) than the presence or absence of an amplified L1
clade. Second, any individual insertion or site that is being scored
for the presence of the insertion could be subject to re-arrangement, e.g. deletion of the inserted element. Therefore, both the
problems of homoplasy and of determining whether the character is an
ancestral or acquired one could theoretically afflict the use of
individual insertion events.
)
)
)
)
)
)
We are indebted to Drs. Steven Hardies, Thomas
Eickbush, Morris Goodman, and Arian Smit for providing unpublished data
and to Steven Hardies for his thoughtful comments.
©1995 by The American Society for Biochemistry and Molecular Biology, Inc.
![]()
CiteULike
Complore
Connotea
Del.icio.us
Digg
Reddit
Technorati What's this?
This article has been cited by other articles:
![]() |
J.-N. Volff, C. Korting, and M. Schartl Multiple Lineages of the Non-LTR Retrotransposon Rex1 with Varying Success in Invading Fish Genomes Mol. Biol. Evol., November 1, 2000; 17(11): 1673 - 1684. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. C. Casavant, L. Scott, M. A. Cantrell, L. E. Wiggins, R. J. Baker, and H. A. Wichman The End of the LINE?: Lack of Recent L1 Activity in a Group of South American Rodents Genetics, April 1, 2000; 154(4): 1809 - 1817. [Abstract] [Full Text] |
||||
![]() |
O. Verneau, F. Catzeflis, and A. V. Furano Determining and dating recent rodent speciation events by using L1 (LINE-1) retrotransposons PNAS, September 15, 1998; 95(19): 11284 - 11289. [Abstract] [Full Text] [PDF] |
||||
![]() |
Conserved Subfamilies of the Drosophila HeT-A Telomere-Specific Retrotransposon Genetics, January 1, 1998; 148(1): 233 - 242. |
||||
![]() |
M.-L. Bang, T. Centner, F. Fornoff, A. J. Geach, M. Gotthardt, M. McNabb, C. C. Witt, D. Labeit, C. C. Gregorio, H. Granzier, et al. The Complete Gene Sequence of Titin, Expression of an Unusual {approx}700-kDa Titin Isoform, and Its Interaction With Obscurin Identify a Novel Z-Line to I-Band Linking System Circ. Res., November 23, 2001; 89(11): 1065 - 1072. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| All ASBMB Journals | Molecular and Cellular Proteomics |
| Journal of Lipid Research | ASBMB Today |