Analysis of the Pre-S2 N- and O-Linked Glycans of the M Surface Protein from Human Hepatitis B Virus*

The surface antigen of hepatitis B virus comprises a nested set of small (S), middle (M), and large (L) proteins, all of which are partially glycosylated in their S domains. The pre-S2 domain, present only in M and L proteins, is furtherN-glycosylated at Asn-4 exclusively in the M protein. Since the pre-S2 N-glycan appears to play a crucial role in the secretion of viral particles, the M protein may be considered as a potential target for antiviral therapy. For characterization of the pre-S2 glycosylation, pre-S2 (glyco)peptides were released from native, patient-derived hepatitis B virus subviral particles by tryptic digestion, separated from remaining particles, purified by reversed-phase high performance liquid chromatography, and identified by amino acid and N-terminal sequence analysis as well as matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF-MS). Pre-S2 N-glycans were characterized by anion exchange chromatography, methylation analysis, and on target sequential exoglycosidase digestions in combination with MALDI-TOF-MS, demonstrating the presence of partially sialylated diantennary complex-type oligosaccharides. In addition, the pre-S2 domain of M protein, but not that of L protein, was found to be partially O-glycosylated by a Gal(β1–3)GalNAcα-, Neu5Ac(α2–3)Gal(β1–3)GalNAcα-, or GalNAcα-residue. The respective O-glycosylation site was assigned to Thr-37 by digestion with carboxypeptidases in combination with MALDI-TOF-MS and by quadrupole time-of-flight electrospray mass spectrometry. Analytical data further revealed that about 90% of M protein is N-terminally acetylated.

Hepatitis B virus (HBV), 1 belonging to the virus family he-padnaviridae, is an important etiological agent of acute and chronic liver disease (1,2). Chronic HBV infection may lead to liver cirrhosis and hepatocellular carcinoma, which result in about 1 million deaths per year worldwide. The virus replicates in the liver and is secreted in large amounts of up to 10 10 particles/ml into the blood (3). In addition to 42-nm DNA containing virions, infected hepatocytes produce subviral, noninfectious 22-nm spherical or filamentous particles in vast excess. The envelopes of virions and subviral particles contain varying amounts of three related HBV-encoded (glyco)protein species termed large (L), middle (M), and small (S) proteins, which are together referred to as HBV surface antigen (HB-sAg). S protein is the major component of virions and both spherical and filamentous HBsAg particles, while filaments and virions contain more M and, in particular, more L proteins than spheres (4,5).
All envelope proteins are produced from a single open reading frame (see Fig. 1A) by the use of three different translation start sites, dividing this open reading frame into three domains: the amino-terminal pre-S1 domain, which occurs exclusively in the L protein; the pre-S2 domain, which is present in both M and L proteins and forms the amino-terminal end of the M protein; and the S domain, which is common to S, M, and L proteins. All proteins possess a potential N-glycosylation site at Asn-146 of the S domain, which, however, is only partially utilized. Hence, the proteins exist in two isomeric forms, being either glycosylated or nonglycosylated in this position, and migrate as doublets in SDS gel electrophoresis (Fig. 1B). The second potential N-glycosylation site, present at Asn-4 2 of the pre-S2 domain, is solely used in the M but not in the L protein (4). In addition to N-glycosylation, pre-S2 domains of HBV M protein expressed in mammalian cell culture (6 -8) or of M protein of the related woodchuck hepatitis virus (WHV) (9) have been reported to be O-glycosylated in as yet unidentified positions.
HBV envelope (glyco)proteins are functionally important during the viral life cycle. L protein mediates binding to human hepatocytes of the virus via its pre-S1 domain (10,11). Furthermore, L protein has been shown to be important for virion envelopment and secretion (12). In order to fulfill these functions, the pre-S domain of the L protein has a dual topology. During and directly after translation, it is located at the cytosolic side of the endoplasmic reticulum and, hence, within the virion after budding. Therefore, L is not N-glycosylated in its pre-S region. Later, L molecules change, in part, their topology by an unknown mechanism, resulting in a surface exposure of pre-S1 and pre-S2 at HBsAg particles and virions (13). In contrast, the pre-S2 domain of M protein is translocated cotranslationally to the endoplasmic reticulum lumen and is, thus, N-glycosylated. The role of M protein was not clear for a long time. Recent studies provided evidence, however, that M protein glycosylation at Asn-4 of the pre-S2 domain as well as subsequent trimming of this N-glycan play an important role in the secretion of HBV virions (8, 14 -19). Prevention of M protein N-glycosylation by either disruption of the Asn-4-X-Ser-6 sequon (16) or by tunicamycin treatment of HBV-expressing cells (15) suppressed secretion of HBV particles. Likewise, treatment of cell cultures with inhibitors of the oligosaccharide-trimming enzymes ␣-glucosidase I and II similarly impaired virion secretion (17,18). The pre-S2 N-glycan mediates an association of the M protein with the chaperone calnexin, whereas the N-glycan linked to Asn-146 of the S domain is not involved in calnexin binding (8,16). It is concluded that proper folding and trafficking of the M glycoprotein, assisted by calnexin in a carbohydrate-dependent manner, may play a crucial role in the assembly of virions, whereas secretion of subviral particles is not prevented by glycosylation-and trimming-inhibitors (15)(16)(17)(18)(19). The pre-S2 N-glycan of the M protein may be considered as a promising target for antiviral therapy of hepatitis B, because viremia can be suppressed by ␣-glucosidase I inhibitors in experimentally infected woodchucks (18). Previous studies on the structure(s) of the N-glycan at Asn-4 of HBV M protein provided contradictory data proposing either par-tially fucosylated, complex-type (16,20) or high mannose-and hybrid-type species (21). The presence of terminal mannose would have been biologically interesting, since a genetic linkage between polymorphism of the mannose-binding protein and the persistence of HBV infection was observed (22).
The pre-S1 and pre-S2 domains of HBsAg display numerous T-and B-cell epitopes (23,24), which are capable of inducing neutralizing antibodies and immune protection (25,26). Hence, L and M proteins are also attractive candidates for the development of improved vaccines against HBV, which might (a) override nonresponsiveness to the standard HBV vaccine consisting only of S protein (27), (b) allow immunotherapeutic treatment of chronic HBV infections (28), and (c) prevent selection of escape mutants with mutations in the S protein (29). Several candidate hepatitis B vaccines containing pre-S2 sequences have been developed that are produced in transfected cell cultures (27, 28, 30 -33). The usefulness of these vaccines is not completely proven yet, particularly because the anti-pre-S response induced by these vaccines in human recipients is rather weak. 3 One reason for this observation may reside in modifications of the pre-S domain. Since naturally generated HBsAg is the target of neutralizing antibodies in vivo, knowledge on post-translational modifications of the pre-S2 sequence is important for the design and evaluation of future hepatitis B vaccines. Current data on N-and O-glycosylation or phosphorylation (8,34) of the pre-S2 region were obtained from trans- fected rodent cell cultures. In order to initiate a detailed structure analysis of the in vivo pre-S2 glycosylation, we isolated HBsAg spheres from the plasma of two chronically infected HBV carriers, released the pre-S peptides from the HBsAg particles by trypsin digestion, and determined the primary structure of tryptic glycopeptides.

EXPERIMENTAL PROCEDURES
Isolation of HBsAg-Subviral spherical and filamentous particles were purified from the sera of two chronically HBV-infected donors, genotype D (HBsAg subtype ayw2), with a virus titer of about 6 ϫ 10 9 /ml and a HBsAg concentration of 100 g/ml. The genotype and the HBsAg subtype were determined by sequencing of the viral genomes as described previously (35). 18 ml of serum were ultracentrifuged in a discontinuous sucrose density gradient (15,25,35,45, and 60% (w/w)) in TNE buffer (20 mM Tris-HCl, pH 7.4, 140 mM NaCl, 1 mM EDTA) for 15 h at 25,000 rpm at 10°C in a TST 28.38 rotor (Beckman, Mü nchen, Germany). Fractions containing 20 -40% sucrose were analyzed on a 12% gel by Laemmli SDS-PAGE under reducing conditions and silverstained. Fractions with the typical HBsAg protein pattern (Fig. 1B) were pooled, adjusted to a density of 1.31 g/ml with solid KBr, and layered for further purification between a KBr density gradient ranging from 1.16 to 1.34 g/ml. Centrifugation for 36 h and analysis of fractions with densities of 1.20 -1.25 g/ml were performed as described above. Pooled fractions were desalted and concentrated by ultrafiltration (Centriplus-100 filter units; Millipore, Eschborn, Germany) washing three times with TNE buffer. The concentration of the purified HBsAg was estimated by A 280 , assuming a value of 4.3 for 1 mg/ml (36), and by amino acid analysis (see below). Purified HBsAg was stored at Ϫ20°C.
Isolation of Tryptic L and M Protein-derived Peptides-Digestion of purified native HBsAg particles (4 mg) was carried out with trypsin (Sequencing Grade, Roche Molecular Biochemicals, Mannheim, Germany) in 1.2 ml of TNE buffer for 1 h at 37°C using an enzyme: substrate ratio of 1:40 (w/w). Peptides and particles were separated by ultrafiltration using one Microcon-100 filter unit (Millipore). Tryptic peptides were isolated by reversed-phase HPLC (rHPLC) on a C 18column (5 m, 30 nm, 4.6 ϫ 250 mm; Vydac, Hesperia, CA) using 0.1% (v/v) aqueous trifluoroacetic acid with an acetonitrile gradient (0 -60% in 60 min) and a flow rate of 1 ml/min at 30°C. Peptides were monitored by absorption at 220 nm, and fractions were collected semiautomatically. Peptide-containing fractions were stored at Ϫ20°C.
Matrix-assisted Laser Desorption/Ionization Time-of-flight Mass Spectrometry (MALDI-TOF-MS) of (Glyco)peptides and Released Oligosaccharides-Molecular masses of rHPLC-purified peptides were determined by MALDI-TOF-MS on a Vision 2000 mass spectrometer (Finnigan MAT, Bremen, Germany). 1 l of peptide solution (1-5 pmol) was mixed with 1 l of matrix solution (10 mg of 2,5-dihydroxybenzoic acid/ml of 0.1% (v/v) trifluoroacetic acid, 30% (v/v) acetonitrile) and allowed to air-dry. Ions were generated by irradiation with a pulsed nitrogen laser (emission wavelength 337 nm; laser power density about 10 6 watts/cm 2 ), and positive ions were accelerated and detected in the reflectron and linear mode. For analysis of released oligosaccharides, 50 pmol of glycans in aequeous solution were used and analyzed as de-scribed above. For calibration of the peptide mass spectra, human angiotensin and bovine insulin (both from Sigma, Deisenhofen, Germany) were used as external standards. For calibration of the mass spectra of released oligosaccharides, the diantennary oligosaccharide standard NA2 (Gal␤4GlcNAc␤2Man␣3(Gal␤4GlcNAc␤2Man␣6)Man ␤4GlcNAc␤4GlcNAc; Oxford GlycoSciences, Abingdon Oxfordshire, United Kingdom) and bovine insulin B (Sigma) were used. The accuracy of mass determination was about 0.03% for free oligosaccharides and (glyco)peptides.
Amino Acid Analysis-20 -50 ng of protein or peptide were lyophilized and hydrolyzed in the gas phase over 6 N HCl, with 0.02% mercaptoethanol, for 24 h at 110°C. Free amino acids were dissolved in 20 l of 0.5 M borate buffer, pH 7.7, derivatized with Fmoc (N-(9-fluorenyl)methoxycarbonyl), and analyzed by rHPLC on a Merck-Hitachi (Darmstadt, Germany) system composed of an AS-4000 autosampler, a L-6200A pump, a F-1050 fluorescence detector, and a D-6000 interface.
Peptide and Protein Sequencing-HPLC-purified peptides (50 -100 pmol) or HBV subviral particles after tryptic digestion (about 500 pmol) were amino-terminally sequenced by automated Edman degradation on an Applied Biosystems (Foster City, CA) pulsed liquid phase sequencer, model 477A or 471A, under standard conditions. Phenylthiohydantoinderivatives of amino acids were identified by an on-line analyzer, model 120A or 140B (Applied Biosystems), with a repetitive yield of 92-95%.
N-terminally blocked tryptic peptides were further digested with chymotrypsin (sequencing grade; Roche Molecular Biochemicals). Resulting products were fractionated by rHPLC and sequenced by Edman degradation as above.
High pH Anion Exchange Chromatography (HPAEC) and Gel Filtration-Separation of released oligosaccharides was carried out at room temperature on a Dionex (Sunnyvale, CA) BioLC system using a Car-boPac PA-100 column (4.6 ϫ 250 mm) in series with a CarboPac PA guard column as described in detail earlier (38). A sodium acetate (210 mM) gradient (0 -100% in 48 min) in 100 mM NaOH was used at a flow rate of 1 ml/min. Fractions (1 ml) were collected and immediately neutralized with 25 l of 1 M acetic acid. Oligosaccharide-containing fractions were pooled, lyophilized, and resuspended in water. Desalting was performed by Bio-Gel P2 (Bio-Rad) chromatography as reported earlier (39).
Carbohydrate Constituent and Methylation Analysis-The carbohydrate constituents were analyzed as detailed elsewhere (40). In short, hydrolysis of liberated oligosaccharides was performed with 4 M trifluoroacetic acid at 100°C for 4 h; for hydrolysis of glycoproteins, 0.5 N sulfuric acid in 80% acetic acid (v/v) was used at 80°C for 6 h. Following reduction with sodium borohydride and peracetylation, alditol acetates were analyzed by capillary gas-liquid chromatography/mass spectrom-  Table I. etry using the instrumentation and microtechniques described earlier (41). For linkage analyses, oligosaccharide alditols were permethylated and hydrolyzed (42). Partially methylated alditol acetates obtained after reduction and acetylation were analyzed as above.
On Target Sequential Enzymatic Digestion of Carbohydrates with Glycosidases in Combination with MALDI-TOF Analysis-rHPLC-purified O-glycosylated peptides or released complex diantennary glycans of the pre-S2 domain were sequentially digested directly on the MALDI target (43). ␤-Galactosidase from bovine testes or Diplococcus pneumoniae, ␤-N-acetylglucosaminidase from D. pneumoniae, and ␣-mannosidase from jack beans (all from Roche Molecular Biochemicals) and O-glycosidase from D. pneumoniae (Calbiochem, Bad Soden, Germany) were dialyzed against 20 mM ammonium acetate buffer adjusted to the suggested pH for each enzyme (i.e. pH 6 for ␤-galactosidase from D. pneumoniae and O-glycosidase; pH 5 for ␤-N-acetylglucosaminidase and ␣-mannosidase; pH 4 for ␤-galactosidase from bovine testes) on a floating membrane (Millipore VS) with a pore size of 0.025 m. Sialidase from Arthrobacter ureafaciens (Calbiochem) was redissolved in water and not dialyzed before usage. 50 pmol (1 l) of released Nglycans or 2 pmol (0.4 l) of rHPLC-purified O-glycosylated peptide (19 -48) were mixed with an equal volume of 6-aza-2-thiothymine matrix (Sigma; 5 mg/ml in water) directly on the MALDI target, air-dried, and analyzed by MALDI-TOF-MS as described above. Spectra were recorded both in reflectron and linear modes. For glycosidase digestion, the analyte spot was resuspended in 1 l of 20 mM ammonium acetate buffer (pH 6.0), and 0.4 milliunits of sialidase (0.4 l) was added. The target was incubated overnight in a moist chamber at 37°C. Spots were air-dried and directly analyzed by MALDI-TOF-MS without adding new matrix. Thereafter, the next enzyme was added, and the reaction was similarly allowed to proceed. The mass profile was determined after each step of digestion. Thus, in the case of the released N-glycans, four cycles were performed with the same analyte spot using 0.4 milliunits of sialidase and 0.5 milliunits of ␤-galactosidase, ␤-N-acetylglucosaminidase, and ␣-mannosidase each, and two cycles were performed in the case of the O-glycosylated peptide using 0.4 milliunits of sialidase and 0.5 milliunits of O-glycosidase.
For immunostaining of the blotted proteins, the membrane was blocked in 5% low fat milk powder in phosphate-buffered saline, pH 7.5, and incubated either with monoclonal anti-mouse antibodies Q19/10 (specific for the N-glycosylated N terminus of M protein) or MA18/7 (specific for the peptide sequence Asp-Pro-Ala-Phe (pre-S1 20 -23) in the pre-S1 domain of L protein) to a concentration of 0.5 ng/l in phosphate-buffered saline with 1% low fat milk powder. Thereafter, anti-mouse antibodies conjugated with alkaline phosphatase (250 units/ml; Roche Molecular Biochemicals) were used at a dilution of 1:1000, and the membrane was developed with 5-bromo-4-chloro-3indolyl-phosphate/nitro blue tetrazolium substrate (Sigma Fast TM , Sigma).
Carboxypeptidase Digestion of Peptides-125 pmol of lyophilized Cterminal tryptic (glyco)peptides of the pre-S2 domain (19 -48) were resuspended in 8 l of 20 mM ammonium citrate buffer, pH 6.0 according to Ref. 44. Aliquots (1 l) of carboxypeptidases P and Y from Penicillium janthinellum (sequencing grade; Roche Molecular Biochemicals) and yeast (excision grade; Calbiochem), each diluted to 50 ng/l with digestion buffer, were added. Digestion was carried out at 37°C for different times (0 min to 2 ϫ 24 h). 15 pmol (1.2 l) of the  Fig. 2) were identified by MALDI-TOF-MS, amino acid analysis, and N-terminal Edman sequencing. Peptides and assigned post-translational modifications are listed. The N-terminal pre-S2 peptide (positions 1-16) was shown to be fully glycosylated and further modified, in part, by acetylation and/or oxidation. The corresponding peptide derived from the L protein (Ϫ6 to 16) was also partially oxidized but not glycosylated at Asn-4. The tryptic peptide 19 -48 of the pre-S2 domain was found to be partially O-glycosylated. Peaks 12-14 were not assigned. digest were mixed with 1 l of 2,5-dihydroxybenzoic acid on the target, and MALDI-TOF analysis was performed as described above.

Quadrupole Time-of-flight (QTOF) Electrospray Ionization (ESI) Mass Spectrometry of Glycopeptides-Positive ion ESI-MS and ESI-
MS/MS was performed on a hybrid QTOF electrospray mass spectrometer (Micromass, Manchester, UK) in the nanospray mode. The ions were produced in an atmospheric pressure ionization/ESI ion source, using argon as a collision gas, and were transported to the mass spectrometer through a hexapole lens for optimal transmission. The nanospray capillaries were produced in the laboratory in Mü nster, using a Kopf vertical pipette puller/model 720 (David Kopf Instruments, Tojunga, CA). The capillaries were not gold-coated, but an internal wire electrode was used.
Nanoelectrospray low energy collision-induced dissociation was performed as described recently (45) with the peptide 19 -48 carrying one Gal-GalNAc disaccharide unit. The sample was dissolved in water to a concentration of 5 pmol/l and introduced to the nanospray needle. The collision-induced dissociation conditions were optimized for the maximal signal intensity of glycosylated fragments (minimal deglycosylation). For MS/MS sequencing, the triply charged precursor ion was selected in the quadrupole, fragmented in the hexapole collision cell, refocused in the radio frequency-only-hexapole, and extracted orthogonally into the TOF analyzer.

RESULTS
Isolation of the pre-S2 Glycopeptides-HBV subviral particles were purified separately from the sera of two well characterized HBV carriers infected with two different but typical strains of genotype D virus. In both cases, identical results were obtained. Therefore, only one set of data is presented. To exclude contamination by glycopeptides containing Asn-146, native HBsAg particles were incubated with trypsin. Since the S domain is known to be highly resistant to trypsin, whereas pre-S cleavage sites are very sensitive (46), only glycopeptides of the pre-S region were released (Fig. 1). After removal of the digested HBsAg particles by ultrafiltration, released peptides were isolated by rHPLC (Fig. 2). Peptides eluting in the interval of 32-50 min were identified by amino acid analysis, MALDI-TOF-MS, and peptide sequencing (Table I). Compounds eluting prior to peptide 1 could be neither registered by MALDI-TOF-MS nor characterized by amino acid and sequence analysis and are, therefore, assumed to represent nonpeptide contaminants. In parallel, residual trypsin-treated HB-sAg particles (cf. Fig. 1B, lane 3) were also analyzed by Edman sequencing, demonstrating (a) complete removal of pre-S tryptic peptides, (b) Arg-48 to be the most C-terminal cleavage site accessible for trypsin in native particles, and (c) a molar ratio of about 20% of pre-S2-containing proteins (M and L proteins) versus 80% of S proteins in spherical particles.
The N-terminal pre-S2 tryptic peptide (1-16; cf. Fig. 1C) was found by MALDI-TOF-MS analysis to be N-terminally acetylated to an extent of about 90% and was, therefore, not directly accessible to Edman sequencing. Following further digestion of this glycopeptide with chymotrypsin and subfractionation of the resulting peptides by rHPLC, two internal fragments (9 -12 and 13-16) were identified by their molecular masses, and a third (4 -8) could be assigned by Edman degradation (data not shown). The original glycopeptide eluted from the rHPLC column as several successive peaks (marked as Ia in Fig. 2) due to its heterogeneity in glycosylation and acetylation and/or partial oxidation of the N-terminal Met yielding methionine sulfoxide or methionine sulfone derivatives. Oxidation of Met-1 is indicated by the observation that the original pre-S2 glycopeptides (1-16) displayed, in part, additional mass increments corresponding to one or two oxygen atoms, which were not registered in the case of chymotryptic subfragments (4 -8, 9 -12, and 13-16). Furthermore, removal of the N-terminal Met by BrCN cleavage was not possible, in agreement with literature data (47), which confirm that oxidized methionine residues are not cleaved by this treatment. It is not clear, however, whether the oxidation already occurred in vivo or reflects an artifact produced during sample handling in the laboratory. In accordance with literature data, we could further demonstrate that the M protein is N-glycosylated to 100% in the pre-S2 domain, whereas the L protein is not glycosylated in this position, as confirmed by the analysis of the peptide (Ϫ6 to 16; cf. Fig. 1, peptides 8 and 11 in Fig. 2, and Table I). By a similar line of evidence, the pre-S2 tryptic peptide (19 -48)

was found to be partially modified by O-glycosylation (see below).
Structural Analysis of Pre-S2 N-Glycans-For carbohydrate structure analysis, pre-S2 N-glycans were preparatively released from peptides 1-6 (cf. Fig. 2) by treatment with peptide-N 4 -(N-acetyl-␤-glucosaminyl)asparagine amidase F and isolated by rHPLC. Neutral carbohydrate constituent analysis revealed a molar ratio of GlcNAc, Man, and Gal of 3.4:3.0:2.2, which is typical for diantennary complex-type N-glycans. Sialic acid residues were not registered by the method employed. This structure was verified by sequential exoglycosidase digestion in combination with MALDI-TOF-MS (Fig. 3) and by methylation analysis. MALDI-TOF-MS spectra of the released native oligosaccharide fraction displayed three major signals indicative of a complex-type diantennary N-glycan carrying zero, one, or two sialic acid residues (Fig. 3A). Sialylated oligosaccharides and glycopeptides are known to lose sialic acid very easily under MALDI conditions. Even with 6-aza-2-thiothymine as matrix, usually less than 50% of sialylated molecules pass through the reflector intact (48). Therefore, it was not possible to quantify the degree of sialylation from these experiments. After treatment with sialidase from A. ureafaciens, peaks reflecting sialylated components shifted to the first one (Fig. 3B). Subsequent treatment with ␤-galactosidase from D. pneumoniae, cleaving only Gal(␤1-4) linkages, caused a shift in the molecular mass approaching 324 Da, demonstrating the release of two hexose residues (Fig. 3C). Further digestion with ␤-N-acetylglucosaminidase from D. pneumoniae, known to split exclusively GlcNAc(␤1-2) bonds, resulted in the loss of two GlcNAc residues and corresponding mass shifts of roughly 406 Da (Fig. 3D), yielding the molecular mass of the common pentasaccharide core of N-linked oligosaccharides. It is noteworthy that these experiments were performed with only 50 pmol of In order to determine the degree of sialylation, the released oligosaccharides were separated by HPAEC (Fig. 4A), which revealed that 50% of the glycans contained two sialic acids, 45% contained one, and about 5% contained no sialic acid substituent. The detailed structure of the carbohydrate chain is shown in Fig. 4B.
In order to compare pre-S2 N-glycans with the known structure of the N-glycan present in the S domain (49,50), trypsintreated subviral particles, obtained from sera of two different patients, from which the pre-S2 segments were completely removed (as confirmed by SDS-PAGE and Edman sequencing; see Fig. 1C, and see above), were subjected to carbohydrate constituent and methylation analysis. In both cases, the results obtained were compatible with the published diantennary complex-type structure carrying two, one, or zero terminal sialic acids (see Fig. 4B). In detail, Man, Gal, and GlcNAc were identified as neutral oligosaccharide components by carbohydrate constituent analysis in a molar ratio of 3.0:2.2:3.9. Furthermore, methylation analysis revealed the presence of 3,6disubstituted Man, 2-substituted Man, 4-substituted GlcNAc, terminal Gal, and 6-substituted Gal in a molar ratio of about 1.3:2.0:1.5:0.3:1.6, respectively. The published low degree in fucosylation of about 5%, however, could not be confirmed by the methods used. Furthermore, we could exclude O-glycosylation of the S domain due to the absence of GalNAc.

Identification and Structural Analysis of Pre-S2 O-Glycans-
The pre-S2 tryptic peptides (19 -48), identified by amino acid analysis and Edman sequencing, were found to elute in four different peaks from the rHPLC column (Fig. 2, brackets IIa and IIb), each one of which displayed a different molecular mass in MALDI-TOF-MS (see Fig. 5 and Table I). The mass differences were consistent with the mass increments of monosaccharide components such as HexNAc, Hex, and NeuAc (cf. Table I). Carbohydrate constituent analysis of the glycopeptides (fraction IIa peptides in Fig. 2) confirmed the peptides to be modified by an O-glycosidically linked glycan, comprising GalNAc and Gal as neutral monosaccharide constituents. Sialic acid was not detectable by the procedure used. The extent of glycosylation was calculated from the peak areas in the rHPLC chromatogram (Fig. 2), resulting in about 40% to be not O-glycosylated, 5% to contain only one GalNAc-residue (T n -antigen), 30% to contain the disaccharide Gal-GalNAc (Thomsen-Friedenreich antigen, TF-antigen, T-antigen), and about 25% containing the trisaccharide Neu5Ac-Gal-GalNAc (sialosyl T-antigen) (see Fig. 9).
To elucidate the structure of the O-glycan, glycopeptides carrying the trisaccharide unit were treated sequentially with sialidase from A. ureafaciens and O-glycosidase from D. pneu-moniae, specifically releasing Gal(␤1-3)GalNAc␣-chains from Ser/Thr residues, directly on the MALDI target. Resulting products were analyzed by MALDI-TOF-MS. The results revealed mass shifts of about 293 and 365 Da, reflecting the release of one sialic acid and one Gal(␤1-3)GalNAc␣ unit, respectively, yielding the molecular mass of the unsubstituted peptide (data not shown; see also Ref. 43). The glycopeptide containing only the disaccharide unit was similarly susceptible to digestion with O-glycosidase.
In order to verify these conclusions, the O-glycopeptides were subjected to methylation analysis. The results supported the assignment, that the structure of the sialylated glycan is Neu5Ac(␣2-3)Gal(␤1-3)GalNAc␣-Ser/Thr, i.e. the so called sialosyl-T-antigen (see Fig. 9B). The assumption that the reducing GalNAc-residue is ␣-glycosidically linked is based on the reported substrate specificity of the enzyme O-glycosidase, whereas the ␤-anomeric configuration of galactosyl residues was confirmed by on target digestion with ␤-galactosidase from bovine testes in combination with MALDI-TOF-MS (data not shown). Notably, sialic acid was found to be linked to C-3 of the subterminal Gal, although A. ureafaciens sialidase is known to release preferentially (␣2-6)-linked sialic acid residues. Possibly, the kinetic properties of this enzyme are influenced by the presence of matrix during on target digestion of the glycopeptide.
In order to allocate the observed O-glycosylation to M and/or L proteins, HBsAg filaments (cf. Fig. 1B, lane 4) were separated by SDS-PAGE and subjected to blotting and lectin analysis or immunostaining, before and after treatment with A. ureafaciens sialidase, using peanut agglutinin, specifically binding Gal(␤1-3)GalNAc-units or specific monoclonal anti-HBsAg antibodies directed against M or L proteins. The results revealed both M protein subspecies to be O-glycosylated, whereas L proteins did not react with the lectin, demonstrating that this type of O-glycosylation is obviously a specific feature of the HBV M protein (Fig. 6).
Localization of the Pre-S2 O-Glycan-The sequence of the pre-S2 peptide (19 -48) contains seven serine and three threonine residues. Using this peptide for prediction of the O-glycosylation site by the NetOGlyc 2.0 Prediction Server, which produces neural network predictions of mucin type GalNAc O-glycosylation sites in mammalian proteins (51), three threo- nine (Thr-31, -37, and -38) and two serine residues (Ser-43 and -47) were predicted as potential O-glycosylation sites (see Fig.  1C), with Thr-37 having the highest potential of all. Similar results appeared by using the complete M or L protein sequences.
In order to chemically identify the site of O-glycosylation, glycopeptides 15, 16, and 17 as well as the nonglycosylated peptide 19 (see Fig. 2 and Table I) Table I) led to a ladder of signals due to the removal of from one up to 18 -20 amino acids.
In the case of the glycopeptides carrying the disaccharide or the trisaccharide, only up to 10 amino acids were removed, whereas the peptide modified with GalNAc only allowed the release of up to 11 amino acids (Fig. 7). These results pointed to Thr-37 being most likely the O-glycosylation site, since the carboxypeptidases used cleave only unmodified amino acids from the C terminus of peptides.  Table I); B, product obtained after extensive digestion (2 ϫ 24 h) with carboxypeptidases P and Y from P. janthinellum and yeast. 11 amino acids are removed from the C terminus of the peptide, suggesting Thr-37 to be the linkage position of the glycan chain (cf. Fig. 1C). Conditions for MALDI-TOF-MS were as in Fig. 5.
To further substantiate this assignment, the glycopeptide carrying the Gal-GalNAc disaccharide unit was studied by QTOF analysis (Fig. 8). In the nanospray ESI-QTOF MS/MS experiment, both carbohydrate and peptide/glycopeptide sequence data were obtained in a single experiment. The triply charged molecular ion [M ϩ 3H] 3ϩ at m/z ϭ 1129.7 was used as a precursor ion. It depicted the general composition of the glycopeptide, the peptide chain (19 -48) with the attached disaccharide Hex-HexNAc. The predominant fragment ion formation by the cleavage of the peptide chain was observed to be of the b and y type, which was essential for the full assignment of the O-glycosylation site. The N-and the C-terminal sequences were documented by the singly charged b3-b7 and y2-y8 arrays of the peptide fragment ions. The carbohydrate portion was characterized by the disaccharide Hex-HexNAc B-ion at m/z ϭ 366, beside the diagnostic HexNAc glycan ion at m/z ϭ 204. Glycopeptide ions were detected as fully and partially glycosylated singly, doubly, and triply charged fragment ions. The partial deglycosylation in the gas phase occurs due to the relative lability of the glycosidic bonds in comparison with the peptide bonds. However, enough glycosylated species were detected in order to assign the Thr-19 (i.e. Thr-37 of the pre-S2 domain) as the sole glycosylation site, according to the sequence overlapping ions b19 ϩ HexNAc 2ϩ at m/z ϭ 996.9, b19 ϩ HexNAc ϩ at m/z ϭ 1992.7, y13 ϩ HexNAc 2ϩ at m/z ϭ 817.8, and y13 ϩ HexNAc ϩ at m/z 1634.6 (see appropriate enlarge-  Fig. 8B). In general, excellent coverage of the sequence ions was obtained for this rather large peptide stretch, probably due to the high dynamic range of the QTOF instrument. DISCUSSION As outlined in the Introduction, pre-S2 N-glycosylation of HBV M protein as well as proper trimming of the respective carbohydrate chain appear to play a crucial role in the secretion of virions. Since the N-glycan of HBV M protein is considered to be a potential target for antiviral therapy, elucidation of its exact structure is of high interest. Analyses of the N-glycans attached to the pre-S2 peptide region, however, are complicated by the fact that HBsAg particles, usually used as starting material, contain the S protein in vast excess, which is itself N-glycosylated, in part, at Asn-146. Hence, it is difficult to exclude cross-contamination of the pre-S2 glycan fraction by S protein-derived oligosaccharides. In order to overcome this problem, pre-S2-derived (glyco)peptides were selectively released from intact HBsAg particles by trypsin treatment, leaving the S protein intact. Furthermore, resulting (glyco)peptides were separated by rHPLC and identified by amino acid and sequence analysis as well as MALDI-TOF-MS prior to carbohydrate analysis, thus providing for the first time unambiguous information on the pre-S2 N-glycosylation of M protein obtained from patient-derived HBsAg particles. Carbohydrate structure analyses revealed that pre-S2 N-glycans represented exclusively partially sialylated, diantennary complex-type oligosaccharides. Since the HBsAg particles used have been exposed to serum conditions for varying periods of time, the possibility cannot be excluded that monosialylated and unsialylated glycan species may, at least in part, be due to naturally occurring desialylation. In contrast to data reported for Nglycans of (a) HBsAg particles obtained from HBV DNA-transfected HepG2.2.15 cells (16), (b) patient-derived HBsAg subtype adw particles (50), or (c) recombinant hepatitis B pre-S2 surface antigen (20), our analyses detected no fucosylated oligosaccharide species. It is noteworthy that our analytical results also provided no evidence for the presence of a high mannose-or hybrid-type glycan, as suggested primarily on the basis of lectin binding studies (2,21). The question as to whether the differences in fucosylation reflect cell type-or donor-specific variations in glycosylation remains to be investigated.
Amino acid and sequence analyses, as well as MALDI-TOF-MS, of individual pre-S-derived tryptic and chymotryptic (glyco)peptides revealed that about 90% of the M protein is Nterminally acetylated. Possibly, acetylation of the aminoterminal end protects against proteolytic degradation. This may be particularly relevant for the pre-S2 domain, which is known to be highly sensitive toward proteolysis (46). This finding is remarkable in so far that it could be demonstrated that the recently identified M protein of WHV is not N-terminally blocked (9). The third post-translational modification of the pre-S2 domain comprises O-glycosylation, which has similarly been found to be restricted to the M protein. Although there is evidence that HBV M protein expressed in mammalian cell culture (8) (8) were investigated, the size increase of this protein, as evidenced by SDS-PAGE, was significantly larger than expected for the O-linked side chain described here. Likewise, the increase in size, due to O-glycosylation, was higher for WHV than for HBV M protein (9). Possibly, O-glycan substituents exist in rodent cell-derived M proteins in higher numbers and/or enlarged structures. Thus, the carbohydrate moieties, antigenicity and immunogenicity of the pre-S2 domains in certain candidate vaccines may differ from that of natural M proteins. For exact allocation of the O-glycosylation site, two different strategies were adopted: (a) exhaustive digestion of the glycopeptide with carboxypeptidases in conjunction with MALDI-TOF-MS and (b) ESI-QTOF-MS/MS analysis. Although the C-terminal half of the pre-S2 region is rich in Ser and Thr, both techniques revealed a specific substitution of the protein at Thr-37. This is consistent with recent findings (8) that M protein expressed in COS-7 cells is O-glycosylated between positions 27 and 47 of its amino acid sequence. Using the NetOGlyc 2.0 Prediction Server (51) for prediction of potential O-glycosylation sites, five Ser/Thr residues of the C-terminal pre-S2 peptide sequence are highlighted, with Thr-37 having the highest potential. It is interesting to note that in vivo only this amino acid is, in fact, glycosylated. Exact quantitative estimation of the degree of M protein O-glycosylation is not possible, since unglycosylated peptides 19 -48 may be derived from both M and L proteins. Preliminary analyses of pre-S2-derived (glyco)peptides from M protein, isolated by preparative SDS-PAGE and in situ trypsin digestion, similarly revealed the presence of unglycosylated peptides 19 -48, ruling out a complete O-glycosylation of this protein (data not shown).
Since Thr-37, embedded in a sequence context favorable for O-glycosylation, is highly conserved in the pre-S2 domains of HBV genotypes B-F, in all primate HBV genomes 4 including the recently described virus from the new world woolly monkey (52), and even in the otherwise highly divergent pre-S2 sequence of WHV, respective threonine residues might be similarly glycosylated, thus pointing to an as yet unknown function of potential evolutionary advantage. The two isolates of HBsAg analyzed in this study are typical but not totally identical representatives of HBV genotype D2 (53), which is prevalent in 4  Europe and the Middle East. The complete identity of the results, even in the proportions of the various glycan structures, suggests that the detected modifications of pre-S2 may be present in many, possibly most, HBV isolates. The O-glycosylation at Thr-37 of pre-S2, however, cannot be essential because genotype A of HBV has no Thr from position 32 to 55 of pre-S2 but contains numerous Ser residues in this region.
Glycoproteins with O-linked glycans have been found in a number of enveloped viruses including, for example, the M (E1) protein of murine corona virus, carrying exclusively O-glycans (54), and herpes simplex virus glycoproteins (55), Friend murine leukemia virus glycoprotein 71 (56), respiratory syncytial virus G protein (57), and Marburg virus glycoprotein (58), all of which contain both O-and N-glycans. Distinct functions of the O-linked glycans of viral glycoproteins are still obscure. It is speculated, however, that they may influence the antigenicity of the virion surface structure and protect the polypeptide against degradation.
One putative function of the pre-S2 O-glycan may reside in the partial masking of the respective peptide sequence. This assumption is corroborated by the finding that anti-pre-S2 antibodies are not readily detectable in recipients of pre-S2containing vaccines and can be only transiently registered in patients with acute hepatitis B. 5 In particular, mouse monoclonal antibodies directed against an epitope involving Thr-37 or neighboring amino acids have never been identified so far, whereas mouse antibodies recognizing an epitope encompassing the N-glycan linked to Asn-4 are readily available. 6 In addition to modulation of pre-S2 immunogenicity, O-glycans of M protein might also prevent interaction of hepadnaviral surface proteins with cells of the innate immune system or endothelial cells.
In agreement with earlier qualitative data on N-glycosylation (4), this study definitely proves that HBV L protein is not N-glycosylated in its pre-S region, thus supporting the assumption that the pre-S(1 ϩ 2) domain of L protein is not translocated into the lumen of the endoplasmic reticulum during biosynthesis (13). The absence of O-glycans in L protein might suggest that the translocation of the pre-S(1 ϩ 2) domain occurs in the medial to trans-Golgi or later. Alternatively, the Thr-37 of pre-S2 may be masked by pre-S1 sequences.
The cytosolic orientation of the pre-S domains of the L protein allows alternative post-translational modifications such as myristoylation (59) or phosphorylation as has been reported for the duck hepatitis B virus (60). The latter modification has also been observed within the HBV pre-S2 domain, if it is exposed into the cytosol (34). In its phosphorylated state, L protein of avian hepadnaviruses acts as a transcriptional activator and mediates intracellular signaling (61). Structural analyses, performed in this study, provided no evidence for the presence of phosphorylated pre-S2 (glyco)peptides. It is, therefore, concluded that either pre-S2 phosphorylation may not occur in HBV-infected liver cells or phosphorylated L protein may not be incorporated into secreted HBsAg particles.