What I got wrong about shelterin

The ASBMB 2018 Bert and Natalie Vallee award in Biomedical Sciences honors our work on shelterin, a protein complex that helps cells distinguish the chromosome ends from sites of DNA damage. Shelterin protects telomeres from all aspects of the DNA damage response, including ATM and ATR serine/threonine kinase signaling and several forms of double-strand break repair. Today, this six-subunit protein complex could easily be identified in one single proteomics step. But, it took us more than 15 years to piece together the entire shelterin complex, one protein at a time. Although we did a lot of things right, here I tell the story of shelterin's discovery with an emphasis on the things that I got wrong along the way.


TRF1: Hesitation
Once Harold Varmus and I had cloned human telomeres and found they contained TTAGGG repeats (this was in early 1988, but we were scooped by Robert Moyzis' paper later that year (1)), I wrote in my notebook that I should start looking for a protein that recognizes this sequence. At that time, there was a single precedent for such a protein. David Prescott had identified a protein complex that strongly bound to the ends of genesized DNA molecules that make up the macronuclear genome of the ciliate Oxytricha (2). Dan Gottschling (first in Tom Cech's lab and then in Ginger Zakian's lab) had published two papers on Prescott's terminal complex, now called TEBP␣/␤, showing that it is specific for the 3Ј single-stranded (ss) end of the telomeres (3,4). So, although the bulk of the newly identified human telomeric repeats was likely to be in doublestranded form, I thought I should look for a protein that could bind ssTTAGGG repeats.
Lily Shue, a technician who was paid from my Lucille P. Markey Trust award, was tasked with looking for such an activity. She used HeLa nuclear extract provided by Grant Herzog, then a graduate student in Rick Myers' transcription-focused lab and now Lily's husband. Grant tutored Lily and me on gel-shift analysis, making probes, and doing specificity assays. Almost immediately, Lily observed a very abundant activity that bound to ssTTAGGG repeats. But within few months, Lily delivered dev-astating news to me at a New Year's Eve party: the activity she was looking at had higher affinity for RNA than DNA. A few years later, first Howard Cooke and then Tom Cech published on the activity that Lily had detected, which turned out to be due to hnRNPA1 and other similar RNA-binding proteins (5,6). So, our New Year's resolution for 1989 was to look for proteins bound to double-stranded (ds) TTAGGG repeats instead. This search yielded a much less abundant binding activity that was specific for DNA, not RNA. We called it TRF (telomeric repeat binding factor).
In June 1989, the Rockefeller University offered me a position, which was a miracle in my view because my main paper on the shortening of human telomeres in somatic cells and cancers had just been rejected from Science (it came out in MCB in 1990 (7)). Upon moving to Rockefeller University in the mid-1990s, I set out to execute two experiments that I had initiated in Harold Varmus' laboratory. Both were designed to look for in vivo evidence for a TRF-like factor at telomeres. I reasoned that if telomeres contained something like TRF, their nucleosomal structure might be altered. These experiments, completed with graduate student Henrik Tommerup, indeed showed an unusually blurred nucleosomal (MNase) pattern in the ds part of the telomeres, suggestive of nonhistone proteins associated with the telomeric DNA (8). The second experiment was designed to determine whether telomeres were bound to the nuclear matrix. It was known that certain sites in the genome were bound to the nuclear matrix due to the presence of nonhistone proteins, and I reasoned that if telomeres bound to the nuclear matrix, it would be evidence for the nonhistone proteins being present. I found that telomeres stuck to the nuclear matrix very strongly, again consistent with something like TRF existing in vivo (9).
While I was finishing the chromatin and nuclear matrix experiments, a technician, Shawn Kaplan, and a graduate student, Zhong Zhong, were following up on Lily Shue's TRF activity. Zhong and Shawn determined the biochemical features of TRF, documenting a satisfying preference for TTAGGG repeats over telomeric sequences from other organisms (e.g. TTGGGG repeats from Tetrahymena) (10). But Zhong also showed that TRF had a very high off-rate, raising doubts that TRF could be functional at telomeres given its labile binding, and he left the lab in 1991 to work on the then new STAT pathway with James Darnell.
In a final attempt to reassure myself that telomeres contained a dsDNA-binding protein, I asked a postdoc, John Hanish, to set up an assay for formation of new telomeres in human cells. The idea was to test whether the sequence requirements for telomere formation were similar to those of TRF. Transfection of linear plasmids bearing a selectable marker and a few hundred base pairs of TTAGGG repeats at one end was known to seed a functional telomere in telomerase-expressing cells. With this assay, we found that the sequence requirements for telomere healing coincided with the stringent sequence preferences of TRF (e.g. no telomere seeding with a TTGGGG array), whereas telomerase extended all the tested repeats in vitro (11) (see "Addendum"). So, these results were also consistent with a TRF activity. All these tests for the involvement of a duplex telomeric DNA-binding factor were initiated because our TRF activity was initially without precedent. But as these experiments started to reassure me, Rap1, a transcription factor that had been discovered in budding yeast by Kim Nasmyth and David Shore, was shown to bind to yeast ds telomeric DNA in vitro (12,13). In 1990, Art Lustig and David Shore as well as Ginger Zakian's lab showed that Rap1 affects the length of telomeres (14,15). Although this effect could have been indirect (resulting from a transcription effect of Rap1), the simpler interpretation was that Rap1 directly bound to the ds telomeric DNA in yeast.
Despite my findings and the work in yeast, I was unable to convince people in my lab to work on TRF after Zhong's departure. I decided to do it myself, together with technician Laura Chong. I had very little experience in biochemistry and none in protein purification. It took us 2 years to figure out how to purify TRF. It involved six columns and delivered two detectable protein bands. Isolation and renaturation of the 60-kDa band showed that this was the TRF activity. Knowing that TRF was in this band was good news, but it also revealed that our yield was devastatingly low. Paul Tempst at Memorial Sloan Kettering Cancer Center had told me he needed 5 g of a 60-kDa protein to derive peptide sequences by Edman degradation. This meant that we needed to purify TRF from close to Ͼ500 liters of HeLa cells, which was lot more than I could grow in my laboratory. I forked over $17,500 to buy HeLa cells (shipped as pellets on wet ice), a sum close to the supply budget of my single National Institutes of Health (NIH) GM RO1 grant. It worked! Paul Tempst was able to derive TRF peptide sequences that allowed Dominique (Kiki) Broccoli and me to isolate the TRF cDNAs from a phage library.
After several months of cDNA cloning and sequencing, Kiki, Laura, and I determined the TRF protein sequence in late 1993. To my disappointment, TRF was not homologous to Rap1. Worse, TRF looked like a transcription factor, with an Mybtype DNA-binding domain at the C terminus and an acidic domain at the N terminus. Had I just spend 2 years and most of my NIH grant on an irrelevant transcription factor with spurious TTAGGG repeat binding activity? Two pieces of information pulled me out of the funk that I was experiencing in early 1994. First, Bas van Steensel used indirect immunofluorescence to show that TRF was present at HeLa cell chromosome ends. Second, I heard from David Shore that Daniela Rhodes at the Laboratory of Molecular Biology (Cambridge, UK) had just determined the crystal structure of Rap1 and had found two Myb-like folds in its DNA-binding domain.
It took us a year to finalize the paper, which came out at the end of 1995, more than 7 years after we first observed TRF (16). If I had not been side-tracked by all the experiments aimed to validate a TRF-like protein in vivo and if I had had a stronger conviction and more courage, we would have isolated the TRF1 gene and probably the rest of shelterin several years earlier.
Although these experiments kept the laboratory reasonably productive in its early years, they also extended the risky TRF gamble further in time.

TRF2, Rap1, TIN2: Missed opportunities
In early 1996, I found a short cDNA sequence in the EST database that looked similar to the Myb domain of TRF1. Could there be a second TRF? I considered this unlikely because I had done super-shift experiments with a TRF1 antibody; all the detectable HeLa dsTTAGGG-binding activity was shifted up by incubation with the TRF1 antibody, suggesting that TRF1 was the only TTAGGG repeat binding factor in the nuclear extract. I had advertised this view to my lab members who were now (nearly) all working on TRF1. So, once again, I could not find anybody to work on this mysterious Myb domain protein. Fortunately, an M.D. Ph.D. student, Agata Smogorzewska, joined the lab for a rotation in the summer of 1996, and she was unaware (or dismissive) of the lab lore. Agata cloned the cDNA of what now is called TRF2 (17). In addition to the Myb domain, TRF1 and TRF2 showed a large region of homology, now known to represent their highly conserved homodimerization domains (18). We called this the TRF-homology (TRFH) domain by analogy to the Src homology (SH) domains, a nod to my Varmus lab oncogene roots. At the N terminus, TRF2 carried a basic domain, rather than the acidic domain that makes TRF1 resemble a transcription factor. The basic N terminus was the reason for my failure to detect TRF2 by gel-shift assays, creating a complex that was hung up in the slot (17). So if Lily and I had used a different gel system, I could have isolated TRF1 and TRF2 at the same time. A missed opportunity . . . By 1996, I had caught on to the idea that there might be more proteins at telomeres (additional factors besides Rap1 had by now been identified in yeast), and many lab members started looking for them. A two-hybrid screen done by postdoctoral fellow Susan Smith had delivered tankyrase, a poly(ADP-ribose) polymerase (PARP) that binds to TRF1 (19). A second two-hybrid screen by postdoctoral fellow Bibo Li revealed a TRF2-binding protein that we called HC1 until I realized, while writing the paper, that it was the mammalian ortholog of Rap1 (20).
We were pleased with our new proteins and stopped doing two-hybrid screens, turning to the newly developed MS approach instead. This was yet another mistake. Both of our two-hybrid screens had missed a key protein in shelterin, TIN2, which we now know binds to both TRF1 and TRF2. TIN2 (TRF1-interacting nuclear factor 1) was isolated by Judy Campisi's lab in a two-hybrid screen with TRF1 (21). TIN2 did not come out of our TRF2 mass spectrometry efforts either because the TIN2-TRF2 link was unstable at the salt condition used for protein isolation (22,23). It took several years before Jeff Ye, a postdoctoral fellow in my lab and also the Songyang lab, demonstrated that TIN2 could bind to both TRF1 and TRF2 (24,25).

ASBMB Award Article: What I got wrong about shelterin
Rif1 and Tel2: Following the yeast trail to nowhere Because Bibo Li had found the mammalian ortholog of yeast Rap1, I was convinced we should be able to find other budding yeast telomere factors in human cells. In early 2001, Celera allowed academic researchers to search Craig Venter's genome sequence for the steep fee (at least for my lab) of $7,500 per year. As soon as we had access to his DNA sequence, we used budding yeast and fission yeast data to find the most conserved regions of candidate telomere-relevant genes and then BLASTed them through Venter's DNA. Our gene list included the Oxytricha telomeric proteins (␣ and ␤), Tel2, and the two Rap1-interacting factors Rif1 and Rif2. The latter three were known to affect telomere length regulation in budding yeast (26 -28).
Postdoctoral fellow Diego Loyaza rapidly found a human ortholog of Oxytricha TEBP␣. He started working on this candidate, cloning the full-length cDNA and then looking at whether the protein localized to telomeres. Before he got very far, we attended the 2002 meeting on Telomeres and Telomerase at Cold Spring Harbor Laboratory (CSHL), where Peter Baumann from Tom Cech's lab announced Diego's gene as POT1 (Protection of Telomeres 1) and showed that the POT1 protein bound human telomeric DNA in vitro (29). After this painful episode, Diego recovered and studied the effect of POT1 on telomere length regulation (30).
Venter's genome did not only deliver POT1, but also Rif1 and Tel2 (Rif2 is a budding yeast invention). Rif1 and Tel2 became the thesis projects of two M.D. Ph.D. students: Rich Wang and Josh Silverman. My expectation was that they would find that Rif1 and Tel2 affect telomere length, so that they could graduate in time to finish their medical training. However, we learned that neither protein had anything to do with telomeres. Rif1 is a DNA damage-response factor as Josh showed (31), and Tel2 is a chaperone that regulates the stability of PI3K-related kinases as postdoctoral fellow Hiro Takai figured out later (32). Rich had given up on Tel2 and graduated on his finding that TRF2 protects telomeres from t-loop cleavage (33). Ever since this episode, I have been hesitant to take budding yeast information too literally as a guide for our work on mammalian telomeres.

TPP1: The final link
In 2002, Jeff Ye had found that TIN2 interacts with both TRF1 and TRF2 and that this interaction stabilizes the two TRFs on telomeres in vivo (24), leading us to think that telomeres contained a complex composed of TRF1-TIN2-TRF2-Rap1 and a separate POT1 protein. But Diego Loayza had found that POT1 is somehow linked to TRF1 and TIN2 (30). Diego had also shown that POT1 does not have to bind telomeric DNA to accumulate at telomeres, consistent with a proteinbased recruitment. Jeff ran HeLa nuclear extract over a sizing column and indeed found evidence of a large complex that contained TRF1-TIN2-TRF2-Rap1 as well as POT1 (24). So, most likely, POT1 was recruited to telomeres by an (indirect) interaction with the TRF1-TIN2-TRF2-Rap1 complex. For these reasons, Jeff collaborated with our colleague Brian Chait in a mass spectrometry analysis on TIN2-associated proteins and discovered TPP1 in 2003. We first called it POT1-interacting protein 1 (PIP1) because Jeff had quickly found that TPP1 interacted with POT1, now linking all known telomeric proteins together by protein-protein interactions. Jeff presented his data at the 2004 CHSL meeting where two other labs (Smith and Songyang) announced the same result, and a session in the bar made us all settle on the name TPP1 (the two Ps stand for PIP1 and Songyang's PTOP (34) and the T reflects Smith's TINT1 (35)).

Shelterin: Commitment
In June 2005, I arrived at the LXX CSHL Symposium on Quantitative Biology and realized that I was slated to give a special talk, the Reginald Harris lecture. Somehow this had dropped out of my memory. Panicked, I sat in my room for a full day to change my talk into something appropriate for this honor. I reviewed everything we knew about the telomeric protein complex and decided to declare victory. In my talk, I announced that telomeres contain a six-subunit complex, which I proposed to call shelterin. I discussed how shelterin regulates telomere length and proposed that shelterin blocks the DNA damage response by forming the telomeric t-loop structure (36). Coming back from the meeting, I learned that my lab hated the name shelterin. But there was no turning back after my CSHL talk and my promise to Terri Grodzicker that I would write a review on shelterin (37). In the review, I made the argument that shelterin was the main mechanism by which telomeres prevent activation of the DNA damage response (37). I relegated tankyrase, the Mre11 complex, and many other proteins that we (and others) had discovered in association with shelterin to a secondary, accessory role. I defined shelterin components as proteins that only localize to telomeres and only function at telomeres. This definition was too strict. Just as cohesin does more than holding sister chromatids together, several shelterin components are involved in nontelomeric activities. But the idea that there are six proteins in human shelterin, a risky proposition at that time, has held up so far. Apparently, I finally had developed the courage to follow my intuition.

Addendum-John
Hanish's experiment also showed that telomeres from one organism (e.g Tetrahymena) do not function in another (human cells). This was an important point because Jack Szostak and Liz Blackburn had done a similar experiment in Saccharomyces cerevisiae where Tetrahymena appeared to stabilize the ends of a linear DNA (38). Their experiment was generally interpreted to evidence trans-kingdom conservation of telomere function, which would be a strong argument against sequence specific proteins that recognize the ds telomeric DNA. However, on closer inspection, these Tetrahymena telomeres always had yeast telomeric DNA added to them. My interpretation was that the yeast telomeric DNA was needed for stable ends, and the Tetrahymena sequence had functioned as a telomerase primer. John's telomere healing experiments also argued against the idea that telomeres were protected by the G4 structure. This had been a popular model after G4 structures were found to be a common feature of telomeric repeats. Our experiments showed that G4 structures were certainly not enough for telomere function because they could be formed by several of the sequences that did not seed new telomeres.