Glycome Diagnosis of Human Induced Pluripotent Stem Cells Using Lectin Microarray*

Induced pluripotent stem cells (iPSCs) can now be produced from various somatic cell (SC) lines by ectopic expression of the four transcription factors. Although the procedure has been demonstrated to induce global change in gene and microRNA expressions and even epigenetic modification, it remains largely unknown how this transcription factor-induced reprogramming affects the total glycan repertoire expressed on the cells. Here we performed a comprehensive glycan analysis using 114 types of human iPSCs generated from five different SCs and compared their glycomes with those of human embryonic stem cells (ESCs; nine cell types) using a high density lectin microarray. In unsupervised cluster analysis of the results obtained by lectin microarray, both undifferentiated iPSCs and ESCs were clustered as one large group. However, they were clearly separated from the group of differentiated SCs, whereas all of the four SCs had apparently distinct glycome profiles from one another, demonstrating that SCs with originally distinct glycan profiles have acquired those similar to ESCs upon induction of pluripotency. Thirty-eight lectins discriminating between SCs and iPSCs/ESCs were statistically selected, and characteristic features of the pluripotent state were then obtained at the level of the cellular glycome. The expression profiles of relevant glycosyltransferase genes agreed well with the results obtained by lectin microarray. Among the 38 lectins, rBC2LCN was found to detect only undifferentiated iPSCs/ESCs and not differentiated SCs. Hence, the high density lectin microarray has proved to be valid for not only comprehensive analysis of glycans but also diagnosis of stem cells under the concept of the cellular glycome.


Induced pluripotent stem cells (iPSCs) can now be produced from various somatic cell (SC) lines by ectopic expression of the four transcription factors. Although the procedure has been demonstrated to induce global change in gene and microRNA expressions and even epigenetic modification, it remains largely unknown how this transcription factor-induced reprogramming affects the total glycan repertoire expressed on the cells.
Here we performed a comprehensive glycan analysis using 114 types of human iPSCs generated from five different SCs and compared their glycomes with those of human embryonic stem cells (ESCs; nine cell types) using a high density lectin microarray. In unsupervised cluster analysis of the results obtained by lectin microarray, both undifferentiated iPSCs and ESCs were clustered as one large group. However, they were clearly separated from the group of differentiated SCs, whereas all of the four SCs had apparently distinct glycome profiles from one another, demonstrating that SCs with originally distinct glycan profiles have acquired those similar to ESCs upon induction of pluripotency. Thirty-eight lectins discriminating between SCs and iPSCs/ESCs were statistically selected, and characteristic features of the pluripotent state were then obtained at the level of the cellular glycome. The expression profiles of relevant glycosyltransferase genes agreed well with the results obtained by lectin microarray. Among the 38 lectins, rBC2LCN was found to detect only undifferentiated iPSCs/ESCs and not differentiated SCs. Hence, the high density lectin microarray has proved to be valid for not only comprehensive analysis of glycans but also diagnosis of stem cells under the concept of the cellular glycome.
Increasing attention has been paid to iPSCs 2 and ESCs in their pluripotency and medical applications (1,2). However, establishment of a robust evaluation system of their properties, including differentiation propensity and risk of possible contamination of xenoantigens and even potential of tumorigenesis, has been hampered by the lack of comprehensive methodology directly applicable to target stem cells, although this is an emerging issue essential for the safe use of iPSCs in regenerative medicine. From many aspects, cell surface glycans are considered to be ideal targets for analyzing or identifying the phenotype of each cell in a direct manner by the following reasons (3,4). (a) Glycans are located at the outermost cell surface. (b) The total repertoire of cell surface glycans varies at every level of biological organization (i.e. species, tissues, cell types, and molecules). (c) Global alterations of the cellular glycome also occur during development, cellular activation, differentiation, malignant transformation, and inflammation. The cell surface glycans are therefore referred to as the "cell signature" that closely reflects cellular backgrounds and conditions, probably because they are actually functioning as cell-to-cell mediators in extensive biological phenomena. This fundamental nature of glycans should be understood with the fact that they are not encoded directly in the genome but are generated by a complex system of a number of glycosidases and glycosyltransferases, whose expressions and activities are significantly affected by both intracellular and extracellular environmental changes. Indeed, cell surface molecules, such as stage-specific embryonic antigens (SSEA1 and -3/4) (5) and tumor rejection antigens (Tra-1-60 and Tra-1-81) (6 -8) are glycobiomarkers widely used to evaluate pluripotency. Notably, however, these "representative" glycomarkers have been identified following rather fortuitous development of their specific antibodies, because most carbohydrate structures are poorly antigenic between mammals. In this context, a systematic search is necessary to draw a whole picture of the stem cell glycome and harness its effect on stem cell biology (8,9). For instance, the growth and directed differentiation of stem cells to specific progeny lineages in cell culture remain problematic. Understanding how stem cells communicate with one another and feeder cells through cell surface glycans may lead to rational design of specific culture systems. However, the glycome is a quite difficult target to predict solely based on any genomic data base because the biosynthetic process of the glycan moieties of glycoproteins is not template-driven and is subject to multiple sequential and competitive enzymatic pathways. In this sense, a rapid and sensitive system enabling direct monitoring of cell surface glycans is essential.
Several methods have been developed for glycan analysis based on physicochemical principles, such as liquid chromatography and mass spectrometry (10 -12). Lectin microarray is an alternative technology for structural glycomics, where a panel of lectins with various glycan-binding specificities is printed on a microarray, providing a versatile platform for rapid and high throughput analysis of glycan structures without liberation of glycans (13,14). Lectins are a class of decoder molecules of cell surface glycans distributed throughout organisms, which mediate various functions through specific glycan recognition. Analytical protocols using lectin microarray have been developed for various sample types: free oligosaccharides (14,15), tissue sections (16), cell membrane hydrophobic fractions (17,18), and even whole cells (19,20). This technology has just begun to be applied to a wide variety of biological researches, including virus profiling (21) and cell profiling (17,20), and development of cancer glycobiomarkers (22)(23)(24). For cell profiling, less than 100 ng of proteins in hydrophobic fractions are sufficient for each analysis (25,26). Data processing and normalization procedures were optimized to ensure the proper interpretation of the data (25,26). More recently, we have demonstrated that lectin microarray is also applicable to stem cells (27,28), although we have yet to reach a clear conclusion as to how the cellular glycome changes upon induction of pluripotency. Moreover, practical applicability of this technology to the quality control of stem cells has not been attained.
Here, we developed an advanced platform of high density lectin microarray with the increased number of probe lectins (96 lectins) to expand the glycome coverage for more precise comparison of various stem cell glycomes. A systematic survey of the cellular glycome was then performed toward 135 cell types in total, including iPSCs (114 cell types) and ESCs (nine cell types). Through this comprehensive analysis, we obtained strong evidence that all of the four SCs with originally distinct glycan profiles have acquired those similar to ESCs upon induction of pluripotency. We also found structural features common to iPSCs and ESCs, which corresponded well to the results of gene expression analysis of glycosyltransferases. Finally, we demonstrate the applicability of lectin microarray in the stem cell diagnosis of multiple factors, including discrimination between undifferentiated and differentiated cells as well as detection of the contamination of the xenoantigen, ␣Gal epitope.
Lectins-Lectins from natural sources (58 lectins) were purchased from J-OIL MILLS, Vector Laboratories, EY Laboratories, and Seikagaku Corp. (see the lectin list in supplemental Table S2). Recombinant lectins were prepared as follows. Briefly, genes of carbohydrate recognition domains were cloned into pET27b (Stratagene) and were overexpressed in the Escherichia coli BL21-CodonPlus (DE3)-RIL strain under the control of isopropyl-␤-D-thiogalactopyranoside (Fermentas Hanover) at appropriate temperatures. All recombinant lectins were purified by affinity chromatography using appropriate sugar-immobilized Sepharose 4B-CL (GE Healthcare) based on the glycan binding specificity of each lectin. They were then dialyzed against diluted PBS (final concentration 2.5 mM phosphate buffer containing 0.015 M NaCl). The protein concentration was determined by a BCA protein assay (Bio-Rad). Lectins were freeze-dried and stored at 4°C until use. The purity was checked by SDS-PAGE and gel filtration chromatography on Shodex PROTEIN KW-802.5 (Shodex). The glycan binding activity and specificity were analyzed by hemagglutinating activity using 4% rabbit erythrocytes, frontal affinity chromatography (35), and glycoconjugate microarray (36).
Lectin Microarray Analysis-Hydrophobic fractions were prepared using CelLytic minimum essential medium protein extraction (Sigma-Aldrich) in accordance with the manufacturer's procedures (25,27). After protein quantification using a BCA assay (Thermo Fisher Scientific), hydrophobic fractions were fluorescently labeled with Cy3 monoreactive dye (GE Healthcare), and excess Cy3 was removed with Sephadex G-25 desalting columns (GE Healthcare). After adjusting the protein concentration to 2 g/ml with PBST (10 mM PBS, pH 7.4, 140 mM NaCl, 2.7 mM KCl, 1% Triton X-100), the hydrophobic fraction was labeled with Cy3 NHS ester (GE Healthcare). After dilution with probing buffer at 0.5 g/ml, the Cy3-labeled hydrophobic fraction was applied to the lectin microarray and incubated at 20°C overnight. After washing with probing buffer, fluorescence images were acquired using an evanescent field-activated fluorescence scanner (GlycoStation TM reader 1200; GP BioSciences). The fluorescence signal of each spot was quantified using Array Pro Analyzer version 4.5 (Media Cybernetics, Bethesda, MD), and the background value was subtracted. The background value was obtained from the area without lectin immobilization. The lectin signals of triplicate spots were averaged and normalized to the mean value of 96 lectins immobilized on the array. An inhibition assay was performed by incubating Cy3-labeled cell membrane fractions of MEF(#1) or MRC5-iPS#25(P22)(#13) with a lectin microarray either in the absence or presence of 100 g/ml of Gal␣1-3Gal␤1-4GlcNAc-PAA (catalog no. 01-079, Glycotech) or a negative control PAA (catalog no. 01-000, Glycotech).
Gene Expression Analysis-Total RNA was extracted from each sample by using ISOGEN (NipponGene). The global gene expression patterns were monitored using Agilent whole human genome microarray chips (G4112F) with one-color (cyanine 3) dye. This microarray covers 41,000 well characterized human genes and transcripts. Of the 41,000 probes, 16,483 representative probes corresponding to the microarray quality control unique genes were used for the following analyses (37).
Statistics-Unsupervised clustering was performed by employing the average linkage method using Cluster 3.0 software. The heat map with clustering was acquired using Java Treeview. Differences between the two arbitrary data sets were evaluated by Student's t test to each lectin signal using SPSS Statistics 19 (SPSS). Significantly different lectin signals or the glycosyltransferase expression were selected if they satisfied a familywise error rate (FWER) by the Bonferroni method of Ͻ0.001.
Teratomas-Teratoma formation was performed as described previously (1, 2). The 1:1 mixtures of the human iPSC suspension and basement membrane matrix (BD Biosciences) were implanted subcutaneously at 1.0 ϫ 10 7 cells/site into immunodeficient, non-obese diabetic/severe combined immunodeficiency mice. Teratomas were surgically dissected out 8 -12 weeks after implantation and were fixed with 4% paraformaldehyde in PBS and embedded in paraffin. Sections of 10-m thickness were stained with hematoxylin-eosin.
Glycoconjugate Microarray Analysis-Glycoconjugate microarray production and analysis were performed as described previously (36). Briefly, glycoproteins and glycoside-polyacrylamide conjugates were dissolved in the Matsunami spotting solution at a final concentration of 0.5 and 0.1 mg/ml, respectively. After filtration, they were spotted on the Schott epoxycoated glass slide using the Microsys non-contact microarray printing robot.
Cy3-labeled lectins dissolved in the probing solution (10 or 1 g/ml) were applied to each chamber of the glycoconjugate microarray (100 l/well) and were incubated at 20°C overnight. After washing the chambers with the probing solution, fluorescent images were immediately acquired using an evanescent field-activated fluorescence scanner, the GlycoStation TM Reader 1200, under Cy3 mode. Data were analyzed with the Array Pro analyzer version 4.5 (Media Cybernetics, Inc.). The net intensity value for each spot was determined by signal intensity minus background value. The lectin signals of triplicate spots were averaged and normalized to the highest signal intensity among 98 glycoconjugates immobilized on the array.

Development of High Density Lectin
Microarray-In order to increase glycome coverage and the selection range of lectins suitable for stem cell evaluation, we first increased the number of immobilized lectins from 43 to 96, which is the largest number of immobilized lectins reported (39). For this purpose, lectins with defined structures were first categorized into lectin families with different protein scaffolds. We then selected lectins from various lectin families, intending to cover a wider range of glycan binding specificities. Especially, we increased lectins specific to terminal modifications, such as Sia and Fuc, which often change dramatically depending on cell properties. For production of recombinant lectins, the E. coli expression system was chosen to avoid glycosylation of the produced lectins, which might cause nonspecific binding to lectin-like molecules in the objective samples. The recombinant lectins thus produced were purified by affinity chromatography using the most appropriate sugar-immobilized Sepharose. The glycanbinding specificities of 96 lectins used in this study were analyzed by both glycoconjugate microarray (supplemental Fig. S1 and Table S1; also see "Experimental Procedures") (36) and, more quantitatively, frontal affinity chromatography (see the Lectin Frontier Database Web page) (35,40). Their basic specificities evaluated by the above two analytical methods are briefly summarized in supplemental Table S2. The 96 lectins were spotted onto epoxy-activated glass slides by a non-contact spotter (supplemental Fig. S2), and their quality was extensively assessed using a Cy3-labeled test probe (25). Lot-to-lot variance (coefficients of variation) of the developed high density lectin microarray was confirmed to be low (0.14) after mean normalization (25).

Transcription Factor-induced Reprogramming Leads to a Global Reversion Down to the Pluripotent State at a Cellular
Glycome Level as Well-Using the developed lectin microarray, we have analyzed 135 cell samples in total, including 114 iPSCs, 11 SCs, and nine ESCs, all from human origins, as well as one mouse embryonic fibroblast (MEF). Human iPSCs were generated from four different SC lines: MRC5, AM, UtE, and PAE (supplemental Table S3) (28). We have also analyzed human iPSCs generated from human dermal fibroblasts with four (201B7) (1) and three transcription factors (253G1) (41) and three cell lines of human ESCs (42). All iPSCs used in this study were morphologically similar to ESCs, and their pluripotency was confirmed by staining with the established undifferentiation markers (SSEA4, Tra1-60, Oct4, Nanog, and Sox2) and DNA microarray (28).
Cell membrane hydrophobic fractions were prepared, and the extracted glycoproteins were then labeled with Cy3-N-hydroxysuccinimide ester and analyzed by lectin microarray (25). We have analyzed cell membrane fractions because they can be stored in a freezer until use and are easy to handle, allowing comprehensive analysis of a large number of samples (25,26).
After being mean-normalized, the obtained data were first analyzed by unsupervised hierarchical clustering (Fig. 1). As a result, differentiated SCs and undifferentiated iPSCs/ESCs were clearly separated into two large clusters, whereas the four SCs were further separated according to their origins. This indicates that SCs (MRC5, AM, UtE, and PAE) with different glycan profiles have acquired profiles quite similar to one FIGURE 1. Unsupervised cluster analysis. Lectin microarray data of iPSCs (n ϭ 123), their parental SCs (n ϭ 11), ESCs (9), and MEF (n ϭ 1) were meannormalized and log-transformed and then analyzed by Cluster 3.0. The zero value of the lectin signal was converted to 1. Yellow, positive; blue, negative. Clustering method was average linkage. The heat map with clustering was acquired using Java Treeview.
another and even to ESCs upon induction of pluripotency. Thus, transcription factor-induced reprogramming was found to lead to a global reversion down to the pluripotent state at a cellular glycome level as well (27,28).
Selection of the Best Lectin Probe to Discriminate Pluripotency-We then addressed the challenge to develop a lectin-based procedure to discriminate between differentiated SCs and undifferentiated iPSCs/ESCs, which could be utilized to monitor the state of differentiation. As described, rGC2, rBC2LCN, SNA, TJAI, SSA, rPSL1a, rRSIIL, BPL, and AAL gave Lectins with significantly different signals (FWER Ͻ 0.001) between undifferentiated iPSCs/ESCs (n ϭ 123) and differentiated SCs (n ϭ 11) were categorized into six groups based on the glycan binding specificities of lectins. Data are shown with t values. Also see supplemental Table S4.  Table S5. significantly higher signals in iPSCs/ESCs than SCs with FWER Ͻ 0.001 (Table 1). Among them, rBC2LCN showed the best performance as a probe to detect only undifferentiated iPSCs/ESCs but never reacted with differentiated SCs and MEF, whereas other lectins also reacted with SCs (Table 1). Namely, although rGC2 showed a better score in terms of FWER (1 ϫ 10 Ϫ21 ) than rBC2LCN (2 ϫ 10 Ϫ19 ), the former reacted strongly with MEF (Fig. 5). Similarly, SNA (4 ϫ 10 Ϫ11 ), a representative ␣2-6Sia-binding lectin, showed significant cross-reactivity with a part of SCs derived from PAE in addition to MEF (Fig. 5).
Monitoring the Contamination of the Xenoantigen, ␣Gal Epitope-From a practical viewpoint, monitoring possible contamination by xenotransplantation antigens in iPSCs/ESCs is essential for their safe use in regenerative medicine. A recombinant MOA (rMOA) recognizes the xenotransplantation antigen Gal␣1-3Gal␤1-4GlcNAc (44) present in most cells from New World monkeys and non-primate mammals, including mice, but not in humans. Indeed, rMOA strongly bound to MEFs but not to any human SCs (Fig. 6). Therefore, rMOA signals should not be detected in human iPSCs. However, triplicate samples of the two cell lines MRC5-iPS#25(P22)(#13-15) and UtE-iPSB05(P13)(#64 -66) exhibited significant signals on rMOA. In order to validate whether the binding of rMOA is mediated by a carbohydrate recognition domain of rMOA, we then performed inhibition assay. As shown in Fig. 6B, the binding of rMOA to cell membrane fractions of MEF and MRC5-iPS#25(P22,#13) were abolished in the presence of 100 g/ml Gal␣1-3Gal␤1-4GlcNAc-PAA (Fig. 6B), but no inhibitory effect was observed for 100 g/ml of a negative control (PAA without sugar moiety), indicating that the binding is due to specific interactions via the rMOA carbohydrate recognition domain. As expected, no inhibitory effect of Gal␣1-3Gal␤1-4GlcNAc-PAA on a Fuc-binding lectin, rAAL, was observed. These data unambiguously reflect contamination by the xenoantigen ␣Gal epitope, in the above two cell lines, which were most probably contaminated with MEF.

DISCUSSION
Using the developed high density lectin microarray, we performed a systematic analysis of cell surface glycans of a large set of human iPSCs (114 cell types) and ESCs (nine cell types). As a result, a basis for a rational stem cell evaluation system was established, which can reveal both the state of undifferentiation and inclusion of ␣Gal epitope (a representative xenoantigen). Such a comprehensive glycome analysis targeting iPSCs and ESCs has never been carried out so far. There are at least three key advantages in using a lectin microarray. 1) An overall glycan profile of each cell type is readily obtained using a relatively small number of cells (ϳ1 ϫ 10 3 ), and thus, the method is widely applicable to stem cells. 2) The proposed evaluation system includes selection of the best probe by a statistical strategy among a number of lectins, which are immobilized on the array. As candidate probes, carbohydrate-binding antibodies developed so far could also be included. 3) Various properties of stem cells can be assessed simultaneously (i.e. with "one-chip" technology). Using the same strategy described in this study, lectinbased evaluation methods targeting tumorigenesis and the differentiation propensity of stem cells could also be developed.
In contrast, Globo H has also been reported to be overexpressed in epithelial cell tumors (46). Furthermore, ␣2-6Sia up-regulated in iPSCs/ESCs has been reported to be overexpressed in many types of human cancers, and its high expression positively correlates with tumor metastasis and poor prognosis (47). Thus, the glycan alterations upon induction of pluripotency observed in this study are apparently similar to FIGURE 5. Selection of the best lectin probe to discriminate pluripotency. The mean-normalized signal intensities of rGC2, rBC2LCN, and SNA to MEF (n ϭ 1), SCs (n ϭ 11), and iPSCs/ESCs (n ϭ 123) are shown. FIGURE 6. Monitoring the contamination of the xenoantigen ␣Gal epitope using the rMOA lectin. A, mean-normalized lectin microarray data are represented by a bar graph. Numbers correspond to cell types described in supplemental Table S3. B, inhibition assay. Cy3-labaled cell membrane fractions of MEF(#1) or MRC5-iPS#25(P22)(#13) were incubated with lectin microarray either in the absence (None) or presence of 100 g/ml Gal␣1-3Gal␤1-4GlcNAc-PAA or negative control PAA without sugar moiety. Data shown were obtained at gain 110 for MEF and gain 120 for MRC5-iPS#25(P22). those occurring during malignant transformation, as was implied recently (9). Although the reason for this similarity remains to be elucidated, the characteristic glycan changes should be related to the ability of eternal cell proliferation and maintenance, properties common to both cancer cells and pluripotent stem cells.
Glycans are located at the outermost cell surface, where various events take place on the basis of cell-to-cell recognition and interactions. Endogenous lectins, major counterpart molecules of glycans, should play crucial roles in the events (e.g. by regulating several signaling pathways). In this context, interactions occurring between cell surface glycans and endogenous lectins are considered to be essential for the maintenance of pluripotency, self-renewal, and differentiation of iPSCs/ESCs (48). Indeed, heparan sulfate proteoglycans were reported to regulate self-renewal and pluripotency of embryonic stem cells (49). Moreover, reduced sulfation on heparan sulfate and chondroitin sulfate were demonstrated to direct neural differentiation of mouse ESCs and human iPSCs (50). Recently, synthetic substrates recognizing cell surface glycans were reported to facilitate the long term culture of pluripotent stem cells (48). Thus, global analysis of the cellular glycomes of iPSCs and ESCs performed in this study will be necessary to provide the basis to explore the functions and applications of the stem cell glycobiology. They includes rational design of the effective substrates and culture conditions to support the long term propagation of ESCs and iPSCs (48). Of course, the results obtained in this study could also be readily applied to staining (specification of the place the event occurs), enrichment (e.g. lectin-aided capturing of necessary cells), and targeting of specific cells (e.g. elimination of unwanted undifferentiated cells). In this regard, stem cell glycoengineering with the aid of a lectin microarray is a key issue in realization of regenerative medicine in the near future.