Structural basis of nucleosomal histone H4 lysine 20 methylation by SET8 methyltransferase

Cryo-EM structures of the human SET8–nucleosome complexes reveal the mechanism by which the SET8 methyltransferase binds the nucleosome and specifically recognizes the histone H4 lysine-20 residue.


Introduction
Chromatin accommodates genomic DNA in eukaryotes. The fundamental unit of chromatin is the nucleosome, which wraps DNA around a histone octamer, containing two copies of each of the core histones H2A, H2B, H3, and H4 (Wolffe, 1998). Posttranslational modifications (PTMs) of histones, such as methylation, acylation, and phosphorylation, predominantly occur in the N-terminal tails of histones, and function as epigenetic marks to recruit their specific binding proteins to chromatin (Strahl & Allis, 2000;Kouzarides, 2007;Bannister & Kouzarides, 2011). These modification-specific histone-binding proteins, termed "readers," regulate genomic DNA accessibility by changing the higher order structure and dynamics of chromatin (Ruthenburg et al, 2007). In contrast, histone-modifying enzymes, termed "writers," demarcate genomic regions with distinct chromatin structures and functions by introducing specific histone PTMs, which recruit their particular binding proteins (Ruthenburg et al, 2007).
SET8 (also named PR-SET7, SETD8, or KMT5A) is a histone methyltransferase that is solely responsible for H4K20 monomethylation in cells (Oda et al, 2009;Brustel et al, 2011;Beck et al, 2012;Jørgensen et al, 2013). Intriguingly, SET8 primarily promotes H4K20 monomethylation in the nucleosome (Fang et al, 2002;Nishioka et al, 2002), although it also possesses the ability to methylate the nucleosome-free H4K20 residue (Couture et al, 2005). The nucleosome containing the centromeric histone H3 variant, CENP-A, is a preferred substrate for SET8 because the H4 N-terminal tails in the CENP-A nucleosome are more accessible than those in the canonical H3 nucleosome (Arimura et al, 2019). However, the mechanism by which SET8 specifically targets the nucleosome has remained elusive.
In the present study, we determined the structures of the human SET8-nucleosome complexes with histone H3 and CENP-A by cryo-EM at 3.15 and 3.00Å resolutions, respectively. The structures explain how SET8 specifically methylates the H4K20 residue in nucleosomes.

Results
The cryo-EM structures of the SET8-nucleosome complexes containing histone H3 and CENP-A To clarify the mechanism by which SET8 specifically promotes H4K20 monomethylation in the nucleosome, we performed single-particle 1 cryo-EM. We purified full-length human SET8 as a recombinant protein (Figs 1A and S1A). The nucleosome core particle (NCP) was reconstituted with recombinant human histones, H2A, H2B, H3.1, and H4, in the presence of the 145-base pair Widom 601 DNA (Fig S1B and C) (Lowary & Widom, 1998). The SET8-NCP complex was then purified by sedimentation in the presence of paraformaldehyde (GraFix) (Fig S1D) (Kastner et al, 2008), and visualized by cryo-EM ( Fig  1B and C). It is possible that the paraformaldehyde cross-linking may affect the SET8 interaction with the NCP. We processed the SET8-NCP complex, followed by a single-particle workflow in the RELION software package (Kimanius et al, 2016). The cryo-EM structure of the SET8-NCP complex was then determined at 3.15 A resolution . Surprisingly, the overall structure of the SET8-NCP complex is different from the structures previously predicted from the low resolution X-ray diffraction data ( Fig S3) (Girish et al, 2016). Unexpectedly, the SET8 interaction with nucleosomal DNA was not obvious, although the EM map of the SET8 region near the nucleosomal DNA is ambiguous.
SET8 can efficiently monomethylate the H4K20 residue in the NCP containing CENP-A (NCP CENP-A ) (Arimura et al, 2019). This may happen because, in the NCP CENP-A , the N-terminal tails of H4 adopt the outward conformation, which is preferred for the H4K20 monomethylation by SET8 (Arimura et al, 2019). We prepared the SET8-NCP CENP-A complex , and the cryo-EM structure was determined at 3.00Å resolution . Consistent with the previous structural studies with CENP-A NCPs (Tachiwana et al, 2011;Pentakota et al, 2017;Chittori et al, 2018;Tian et al, 2018;Allu et al, 2019;Yan et al, 2019;Zhou et al, 2019), the DNA regions of the nucleosomal entry/exit sites are somewhat ambiguous because of the flexible nature of the DNA in the NCP CENP-A (Fig S4A and B). The structure of the CENP-A-specific RG loop is also different from the corresponding loop of H3 in the NCP (Fig S4C  and D). However, the overall structure of the SET8-NCP CENP-A complex is quite similar to that of the SET8-NCP complex. Therefore, the structural characteristics of the NCP CENP-A may not affect the specific NCP binding by SET8, except for its preferred H4 N-terminal tail conformation.

The SET8 N-terminal arginine anchor binds the acidic patch of the nucleosome
In both the SET8-NCP and SET8-NCP CENP-A complexes, the globular SET domain of SET8 is located on the surface of the histone octamer ( Fig 1D and E). The crystal structure of the SET domain (Couture et al, 2005;Girish et al, 2016) fits well into the EM density maps. Interestingly, the α1 helix of the SET domain is clearly observed in the SET8-NCP and SET8-NCP CENP-A complexes (Fig 2). We found that the N-terminal extension of the SET8 α1 helix forms the arginine anchor, in which the Arg188 and Arg192 residues bind to the acidic patch of the NCP (Fig 2A and B). The SET8 Arg188 and Arg192 residues are separately captured by two acidic pockets, formed with the H2A Glu56 and H2B Glu113 residues and the H2A Glu61 and Glu92 residues, respectively ( Fig 2B). The EM densities of the Arg188 and Arg192 residues are also clearly visible in the SET8 N-terminal extension bound to the NCP CENP-A , and both residues are captured by the acidic patch (Fig 2C and D). Therefore, SET8 recognizes both canonical and CENP-A NCPs by the same mechanism, using the Arg188 and Arg192 residues in the N-terminal arginine anchor.
The SET8 arginine anchor is important for its H4K20 monomethylation activity in the nucleosome The acidic patch of the NCP functions as the binding platform for many NCP binding proteins . To test the functional importance of acidic patch binding by SET8, we prepared an acidic patch-defective NCP (NCP apd ). In the NCP apd , the acidic patch Glu and Asp residues (H2A Glu56, H2A Glu61, H2A Glu64, H2A Asp90, H2A Glu91, H2A Glu92, H2B Glu105, and H2B Glu113) were replaced by neutral, hydrophilic Thr, and Ser residues, respectively (Kujirai et al, 2020). These amino acid replacements in H2A and H2B did not affect NCP formation (Fig S5A and B). Interestingly, SET8-NCP binding was drastically decreased in the NCP apd , as compared with the wild-type NCP ( Fig 3A, lanes 1-6, Figs 3B and S6). Consistently, H4K20 monomethylation was undetectable in the NCP apd ( Fig 3C, lanes 1-6, and Fig S7). We also prepared a SET8 mutant, SET8 R188A/R192A, in which both arginine anchor residues, Arg188 and Arg192, were replaced by Ala ( Fig S5C). As expected, the NCP binding of the SET8 R188A/R192A mutant was substantially reduced (Fig 3A, lanes 7-12, Figs 3B and S6), and the H4K20 monomethylation was also diminished (Fig 3C, lanes 7-9, and Fig S7). These results indicated that interaction between the acidic patch and the SET8 arginine anchor plays an essential role in the H4K20 monomethylation activity on the NCP.

The peptide-binding cleft of SET8 captures the H4 N-terminal tail in the nucleosome
In the SET8-NCP complexes, the cryo-EM densities corresponding to the H4 N-terminal tails of the NCP and the NCP CENP-A were clearly observed (Fig 4A and B). The superimposition of the crystal structure of the SET domain on the cryo-EM maps of the SET8-NCP complex or the SET8-NCP CENP-A complex revealed that the H4 N-terminal tail fits very well within the peptide-binding cleft of SET8 (Fig 4A and B). In the canonical NCP, the H4 N-terminal tail reportedly adopts two configurations, outward and inward (Arimura et al, 2019). In the NCP CENP-A , the outward conformation of the H4 N-terminal tail is preferred (Arimura et al, 2019). Intriguingly, in both the SET8-NCP and SET8-NCP CENP-A complexes, the H4 N-terminal tail adopts the outward configuration and is incorporated within the peptide-binding cleft of the SET domain ( Fig 4C). Therefore, we (A) Gel shift assay of the NCP or the NCP apd (the acidic patch-defective nucleosome) with SET8 and SET8 R188A/R192A. Double-stranded DNA and single-stranded DNA are denoted as dsDNA and ssDNA, respectively. The amount of SET8 was titrated. A double-stranded DNA 50-mer containing a trace amount of single-stranded 50-mer was included as competitor DNA. NCP (0.52 μM; lanes 1-3, 7-9) and NCP apd (0.52 μM; lanes 4-6, 10-12) were mixed with SET8 (0, 1.0, and 2.1 μM; lanes 1 and 4, 2 and 5, and 3 and 6, respectively) or SET8 R188A/R192A (0, 1.0, and 2.1 μM; lanes 7 and 10, 8 and 11, and 9 and 12, respectively). (A, B) Quantification of the results in (A). The average % values of three independent experiments shown in Figs 3A and S6 are plotted against the SET8 concentration. (C) Time course methylation assay of the NCP with SET8 or SET8 R188A/R192A, and the NCP apd with SET8. Lanes 1-3, 4-6, and 7-9 indicate results for the NCP with SET8, the NCP apd with SET8, and the NCP with SET8 R188A/R192A, respectively. The experiments were repeated three times, and the reproducibility was confirmed (Fig S7).
concluded that the outward configuration of the H4 N-terminal tail is actually the preferred substrate for SET8.

Discussion
H4K20 monomethylation primarily occurs on the nucleosome, rather than the nucleosome-free H4 (Nishioka et al, 2002;Fang et al, 2002). To understand the mechanism by which the H4K20 residue is specifically monomethylated in the nucleosome, in the present study, we determined the cryo-EM structures of the SET8-NCP and SET8-NCP CENP-A complexes (Fig 1). We found that SET8 contains an arginine anchor formed by Arg188 and Arg192, which specifically binds to the acidic patch of the nucleosomes (Fig 2). Mutational analyses revealed that the interaction between the SET8 arginine anchor and the nucleosomal acidic patch is pivotal in SET8-NCP binding and H4K20 monomethylation (Fig 3). These findings explain why SET8 specifically monomethylates the nucleosomal H4K20 residue, rather than the nucleosome-free H4 (Fang et al, 2002;Nishioka et al, 2002). The acidic patch binding by the SET8 arginine anchor may dictate the SET domain orientation on the nucleosome and, thus, fix the peptide-binding cleft of the SET domain in the appropriate position to accommodate the H4 N-terminal tail of the nucleosome (Fig 4).
The present structure differs substantially from the crystal structure model, in which SET8 binds nucleosomal DNA (Fig S3). The previous crystallographic analysis was performed with the acidic patch binding protein, RCC1, to facilitate crystallization (Girish et al, 2016). The presence of RCC1 may inhibit the proper binding of the SET8 arginine anchor to the nucleosomal acidic patch, and perturb active SET8-NCP complex formation. Meanwhile, the acidic patch blocking by RCC1 may help to capture a transient nucleosomal DNA binding state of SET8. This short-lived SET8-NCP binding may function in properly positioning SET8 on the NCP surface, where the SET8 arginine anchor and catalytic center bind to the acidic patch and the H4 N-terminal tail, respectively (Figs 1, 2, and 4). . DOT1L is the human homolog of yeast Dot1, which promotes mono-, di-, and trimethylations of the H3 Lys79 residue (H3K79) in the nucleosome (Nguyen & Zhang, 2011), and belongs to a different class of lysine methyltransferases from SET8. In fact, the Dot1-class proteins lack the SET domain, which is the catalytic domain of SET8. However, DOT1L binds the nucleosomal acidic patch by a mechanism similar to that of SET8, using the arginine anchor containing two conserved arginine residues. The H2B ubiquitination at the Lys120 residue reportedly enhances the DOT1L methyltransferase activity (Briggs et al, 2002;Ng et al, 2002;McGinty et al, 2008). The H2BK120 ubiquitination reportedly stabilizes the DOT1L-nucleosome binding, but does not affect the overall location of DOT1L on the nucleosome surface (Yao et al, 2019). This suggests that the H2BK120 ubiquitination may function as an auxiliary factor in the DOT1L-mediated H3K79 methylation. Collectively, the evolutionarily conserved arginine anchor in different classes of histone methyltransferases, SET8 and DOT1L, plays a pivotal role in proper nucleosome binding via the acidic patch, and may ensure accurate lysine methylation of the H4K20 and H3K79 residues, respectively.
Previous crystal structures of the NCPs revealed that the H4 N-terminal tail has two conformations, the inward and outward configurations (Arimura et al, 2019). In the SET8-NCP and SET8-NCP CENP-A complexes, the H4 N-terminal tails are close to the outward configuration (Fig 4), indicating that this configuration is required for the H4K20 monomethylation by SET8. In contrast, the inward configuration of the H4 N-terminal tail is induced by its binding to the acidic patch of the neighboring NCP (Luger et al, 1997). The SET8 arginine anchor may have an additional function to mask the acidic patch of the nucleosome, and thus shift the nucleosomal H4 N-terminal tail to the outward configuration for efficient H4K20 monomethylation.
SET8 and SET8 R188A/R192A purification Human SET8 (KMT5A Isoform 2; Uniprot ID: Q9NQR1-2) was used in this study. The coding region of SET8 was inserted into a modified pET15b vector, which contains a His 6 -tag and a PreScission Protease recognition sequence, instead of the thrombin recognition sequence. SET8 was purified according to the previously described method (Arimura et al, 2019). Briefly, human SET8 was produced in E. coli BL21(DE3) by induction with isopropyl β-D-1-thiogalactopyranoside. The cells were sonicated, and the supernatant containing His 6 -tagged SET8 was collected by centrifugation. His 6 -tagged SET8 was purified by Ni-NTA affinity chromatography. The concentration of the recovered His 6 -tagged SET8 was measured by the Bradford method. PreScission protease was added to the sample (4 U/mg), which was then dialyzed against Mono S wash buffer, containing 50 mM Tris-HCl (pH 7.5), 100 mM NaCl, 10% glycerol, and 2 mM 2-mercaptoethanol. Precipitates were removed by centrifugation. After confirming the His 6 -tag removal by SDS-PAGE, SET8 was further purified by Mono S cation exchange chromatography with Mono S elution buffer, containing 50 mM Tris-HCl (pH 7.5), 600 mM NaCl, 10% glycerol, and 2 mM 2-mercaptoethanol. SET8 was finally purified by HiLoad 16/600 Superdex 200 pg (GE Healthcare) gel filtration chromatography, in buffer containing 20 mM Tris-HCl (pH 7.5), 100 mM KCl, 0.2 mM EDTA, 10% glycerol, and 1 mM DTT. The purified SET8 was stored at −80°C. The plasmid for the production of SET8 R188A/R192A was generated by PCR site-directed mutagenesis, and SET8 188A/192A was also purified similarly.

Gradient fixation (GraFix) of the SET8-NCP and SET8-NCP CENP-A complexes
The SET8-NCP sample was prepared by mixing NCP (0.52 μM) with SET8 (1.0 μM) in 1 ml of reaction solution, containing 10 mM HEPES-KOH (pH 7.8), 18 mM Tris-HCl (pH 7.5), 50 mM NaCl, 50 mM KCl, 0.10 mM EDTA, 5% glycerol, 0.5 mM DTT, and 0.10 mM S-adenosyl-Lhomocysteine. SET8 was separately added four times to the reaction solution containing the NCP. The sample mixture was incubated at 25°C in a water bath for 5 min after each SET8 addition. After the SET8 addition was completed, the sample mixture was incubated at 25°C in a water bath for 15 min. The SET8-NCP CENP-A sample was also prepared similarly, by mixing NCP CENP-A (0.52 μM) with SET8 (0.94 μM) in 1 ml of reaction solution.
Before applying the samples onto the gradient solution in the centrifuge tube, 1 ml of the gradient solution was removed from the top. After applying the samples onto the gradient solution, the centrifuge tubes were placed in an SW32Ti swinging bucket rotor (Beckman Coulter) and centrifuged at 27,000 rpm at 4°C for 16 h. The sample fractions (1 ml each) were collected from the top of the gradient solution and analyzed by 6% native polyacrylamide gel electrophoresis with 0.2× (TBE) Tris-Borate-EDTA buffer. The fractions containing the SET8-NCP or the SET8-NCP CENP-A complex were then purified by chromatography on PD-10 columns (GE Healthcare), in 10 mM HEPES-NaOH buffer (pH 7.5) containing 2 mM TCEP (pH 7.5). The samples were finally concentrated and stored at 4°C.

Cryo-EM
For the cryo-EM specimen preparation of both the SET8-NCP and the SET8-NCP CENP-A complexes, the samples (2.5 μl) were applied to glow-discharged grids (Quantifoil R1.2/1.3 200-mesh Cu). The grids were blotted without wait time for 5 s (SET8-NCP) or 6 s (SET8-NCP CENP-A ) with the bolt force set to 0, under 100% humidity at 4°C, using a Vitrobot Mark IV (Thermo Fisher Scientific), and were then directly plunged into liquid ethane. Both the SET8-NCP and SET8-NCP CENP-A complexes were recorded on a Krios G3i cryoelectron microscope (Thermo Fisher Scientific), operated at 300 kV. For the SET8-NCP complex, 2,355 movies were recorded using the EPU (Thermo Fisher Scientific) auto acquisition software with a pixel size of 1.05Å. Digital micrographs of the SET8-NCP complex were recorded with 63 s exposure times on a Falcon 3EC (Thermo Fisher Scientific) direct electron detector in the electron counting mode, retaining a total of 51 frames with a total dose of~52 electron/Å 2 . For the SET8-NCP CENP-A complex, 6,075 movies were recorded using the SerialEM (Mastronarde, 2005) auto acquisition software with a pixel size of 1.05Å. Digital micrographs of the SET8-NCP CENP-A complex were recorded with 7 s exposure times on a K3 BioQuantum (Gatan) direct electron detector in the electron counting mode, using a slit width of 25 eV and retaining 40 frames with a total dose of~60 electron/Å 2 .

Image processing
All movie frames of both the SET8-NCP and SET8-NCP CENP-A complexes were aligned using MOTIONCOR2 (Zheng et al, 2017) with dose weighting. The contrast transfer function (CTF) estimation was performed by CTFFIND4 (Rohou & Grigorieff, 2015) from digital micrographs with dose weighting. For the following image processing of both the SET8-NCP and SET8-NCP CENP-A complexes, RELION3.0 and RELION3.1 (Zivanov et al, 2018) were used. The particles were semi-automatically picked with a box size of 180 × 180 pixels, and junk particles were removed by 2D classification, followed by 3D classification. The crystal structure of the NCP (3LZ0), lowpass-filtered to 60Å, was used as the initial model for the 3D classification of the SET8-NCP complex. The ab initio model generated in the RELION3.1 was used as the initial model for the 3D classification of the SET8-NCP CENP-A complex. The 3D classifications for both the SET8-NCP and SET8-NCP CENP-A complexes were performed, followed by particle polishing and a few rounds of CTF refinement. The resolutions of the refined 3D maps of the SET8-NCP and the SET8-NCP CENP-A complexes were at 3.15 and 3.00Å, respectively, as estimated by the gold standard Fourier Shell Correlation (FSC) at an FSC = 0.143 (Scheres, 2016). Local resolutions of the SET8-NCP and SET8-NCP CENP-A complexes were calculated by RELION3.1. The final maps of the SET8-NCP and SET8-NCP CENP-A complexes were normalized with MAPMAN (Kleywegt et al, 2004), and visualized with UCSF Chimera (Pettersen et al, 2004) and UCSF ChimeraX (Goddard et al, 2018). The details of the processing statistics for the SET8-NCP and SET8-NCP CENP-A complexes are listed in Table S1.

Model building and refinement
The crystal structures of the NCP (PDB: 3LZ0), SET8 (PDB: 1ZKK) without ligands, and the atomic model of the CENP-A from NCP CENP-A (PDB: 6C0W) were placed in the cryo-EM maps of the SET8-NCP and SET8-NCP CENP-A complexes, by rigid-body fitting in UCSF Chimera (Pettersen et al, 2004). The complete models of the SET8-NCP and SET8-NCP CENP-A complexes were manually built with COOT (Emsley & Cowtan, 2004), followed by real-space refinement in Phenix (Adams et al, 2010).

Nucleosomal H4K20 monomethylation assay
Purified NCP or NCP apd (0.52 μM) was mixed with SET8 or SET8 R188A/R192A (0.15 μM) in 5.0 μl of reaction solution, containing 10 mM HEPES-KOH (pH 7.8), 10 mM Tris-HCl (pH 7.5), 50 mM NaCl, 20 mM KCl, 60 μM EDTA, 2% glycerol, 0.50 mM DTT, 80 μM S-adenosylmethionine, and 1.5 μM double-stranded DNA 50mer (as a competitor). The reaction mixtures were incubated for 1 or 3 min at 25°C. The reaction was stopped by adding 5 μl of 4% SDS solution, containing 0.10 mM Tris-HCl (pH 6.8), 20% glycerol, and 0.2% bromophenol blue. The samples were then heated at 95°C for 15 min before fractionation by SDS-18% polyacrylamide gel electrophoresis, using a gel prepared with WIDE RANGE gel preparation buffer (Nacalai Tesque). After the electrophoresis, the proteins were transferred to an Amersham Hybond 0.2 μm (PVDF) polyvinylidene difluoride membrane (GE Healthcare) by a Trans-Blot SD Semi-Dry Transfer Cell (Bio-Rad). The membrane was blocked by 5% skim milk powder dissolved in phosphate buffered saline containing 0.05% Tween 20 (PBS-T) for 1 h at room temperature. The membrane was washed with PBS-T and incubated with the primary antibodies, the mouse monoclonal antibody against monomethylated H4K20 (CMA421; 32 Hayashi-Takanaka et al, 2015) and the anti-H2B monoclonal antibody (53H3: Cell Signaling), diluted with Can Get Signal solution 1 (TOYOBO) at 4°C overnight. The anti-monomethylated H4K20 antibody was diluted to a final concentration of 1 μg/ml, and the anti-H2B antibody was diluted 10,000-fold. The membrane was washed with PBS-T, and then incubated with the secondary antibody (Amersham ECL) and mouse IgG, HRP-linked F(ab9) 2 fragment from sheep (NA9310: GE Healthcare) diluted 10,000-fold with Can Get Signal solution 2 (TOYOBO) at 4°C for 2 h. The membrane was washed with PBS-T, and Amersham ECL Prime Western Blotting Detection Reagent (GE Healthcare) was added to the membrane. The image of the blot was acquired by chemiluminescent detection using an Amersham Imager 680 (GE Healthcare).