Crystal structure of inhibitor-bound human MSPL that can activate high pathogenic avian influenza

Viral infection is triggered when a surface envelope glycoprotein, hemagglutinin (HA), is cleaved by host cell proteins of the transmembrane protease serine (TMPRSS) family. The extracellular region of TMPRSS-2, -3, -4, and MSPL are composed of LDLA, SRCR, and SPD domains. MSPL can cleave the consensus multibasic (R-X-X/R-R) and monobasic (Q(E)-T/X-R) motifs on HA, while TMPRSS2 or -4 only cleaves monobasic motifs. To better understand HA cleavage mediated by MSPL, we solved the crystal structure of the extracellular region of human MSPL in complex with the furin inhibitor. The structure revealed that three domains are gathered around the C-terminal α-helix of the SPD domain. The furin inhibitor structure shows that the side chain of P1-Arg inserts into the highly conserved S1 pocket, whereas the side chain of P2-Lys interacts with the Asp/Glu-rich 99 loop that is unique to MSPL. We also constructed a homology model of TMPRSS2, which is identified as an initiator of SARS-CoV-2 infection. The model suggests that TMPRSS2 is more suitable for Ala/Val residues at the P2 site than Lys/Arg residues.


Introduction
Mosaic serine protease large form (MSPL), also known as TMPRSS13, was originally identified from a human lung cDNA library and is a member of the type II transmembrane serine proteases (TTSPs) (1,2). TTSPs comprise a transmembrane domain near the N-terminus and a catalytic serine protease domain at the C-terminus.
Human MSPL is expressed in lung, placenta, pancreas and prostate (1). Little is known about the biological function of human MSPL, although there are reports it can cleave the spike protein of porcine epidemic diarrhea virus (PEDV) (3), MERS-and SARS-CoV (4), influenza virus hemagglutinin (5), and pro-hepatocyte growth factor (6). TTSPs share a similar overall organization comprising an N-terminal cytoplasmic domain, transmembrane region, and stem/catalytic domains at the C-terminus (7). All TTSPs are synthesized as single-chain zymogens and are subsequently activated into the twochain active forms by cleavage within the highly conserved activation motif. The two chains are linked by a disulfide bridge, so that TTSPs remain bound to the cell membrane (8). The catalytic domain contains a highly conserved 'catalytic triad' of three amino acids (His, Asp, and Ser). The conserved Asp lies on the bottom of the S1 substrate-binding pocket. Substrate specificity results from Arg or Lys residues in the P1 position. Based on similarities in domain structure, the serine protease domain and chromosomal location, TTSPs are classified into four subfamilies: hepsin/TMPRSS, matriptase, HAT/DESC and corin (7,9). MSPL belongs to the hepsin/TMPRSS subfamily.
In this subfamily, hepsin and spinesin contain a single scavenger receptor cysteine-rich repeat (SRCR) domain in the stem region, while MSPL, TMPRSS2, -3, and -4 contains a low-density lipoprotein receptor A (LDLA) domain near the single SRCR domain in the stem region (9). The SRCR domain contains approximately 100-110 amino acids that adopt a compact fold consisting of a curved β-sheet wrapped around an α-helix and is stabilized by 2-4 disulfide bonds. Depending on the number and the position of the cysteine residues, the SRCR domain has been divided into three subclasses (group A, B and C) (10). However, the canonical LDLA domain contains approximately 40 amino acids and contains six conserved cysteine residues that are involved in the formation of disulfide bonds. The LDLA domain also contains a calcium ion coordinated with six highly conserved residues near the C-terminus. Together, the disulfide bonds and calcium-binding stabilize the overall structure of the LDLA domain (11).
It was recently reported that TMPRSS2, -4, and MSPL are involved in infections by influenza virus by cleaving the glycoprotein hemagglutinin (HA) on the influenza viral surface (4,5,12,13,14). Specifically, HA is cleaved into HA1 and HA2 subunits by TMPRSS2, -4, and MSPL. Proteolytic cleavage of HA is essential for influenza virus infection, where HA1 mediates host cell binding as well as initiation of endocytosis and HA2 controls viral-endosomal fusion (15). To date, two main HA processing consensus motifs in the influenza virus have been identified. One is the single basic HA processing motif (Q(E)-T/X/-R) in human seasonal influenza viruses, which contain a single arginine at the cleavage site. The other is a multiple-basic-residue motif (R-X-X/R-R and K-K/R-K/T-R) found in highly pathogenic avian influenza viruses, which contain several basic amino acids at the cleavage site. TMPRSS2 and -4 recognize the single basic HA processing motif, while MSPL recognizes both the single basic and multiple basic residue motifs (5). However, it is not clear why only MSPL is able to recognize the multiple-basic-residue motif. The multibasic motif was also known to be recognized by ubiquitously expressed furin and proprotein convertases (PCs)5/6 in the trans-Golgi network (16). A previous study showed that the enzyme activity of MSPL was inhibited by decanoyl-RVKR-cmk that mimics the substrate for the furin (5). To date, only one structure of the extracellular region of hepsin has been reported among the hepsin/TMPRSS family of proteins (17). The crystal structure of hepsin revealed that the SRCR domain is located at the opposite side of the active site of SPD, and these domains are splayed apart. Because hepsin lacks the LDLA domain, the relative orientation of the LDLA, SRCR and SPD domains in other members of the hepsin/TMPRSS family, such as MSPL, is still unknown. To elucidate the spatial arrangement of the three domains and substrate specificity, we determined the crystal structure of the extracellular region of MSPL in complex with the decanoyl-RVKR-cmk peptide at 2.6 Å resolution. To our surprise, the overall structure of MSPL reveals that the spatial arrangement of the SRCR and SPD domains in MSPL is markedly different from that of hepsin. The complex structure explains how MSPL is able to recognize both the single-and multiple-basic-residue motifs. In addition, we constructed a homology model of TMPRSS2, which is a key protease in SARS-CoV-2 infections. The model was used to investigate the target sequence preference to S1/S2 site of SARS-CoV-2 spike protein.

Overall structure of the MSPL extracellular domain
The extracellular region of hMSPL is composed of an LDLA domain (residues 198-221), an SRCR domain (residues 222-313) and a serine protease domain (residues 321-556) ( Fig. 1A). We expressed and purified the extracellular region (residues 187-586) of hMSPL and crystallized the protein with decanoyl-RVKR-cmk, which is known as a furin inhibitor. Diffraction data was collected at the Photon Factory AR-NE3a (Tsukuba, Japan) and the structure was solved to a resolution of 2.6 Å (Fig. 1B). This is the first published structure of an LDLA-containing hepsin/TMPRSS subfamily protein. The refined model contains the hMSPL with the residue range of 188-558, except 319 and 320, decanoyl-RVKR-cmk, and a calcium ion. Glycans attached to residues Asn250 and Asn400 were observed, but no phosphorylated residues found (18).
The extracellular region of hMSPL is composed of the non-catalytic portion of the Nterminal region (LDLA domain and SRCR domain) and the catalytic part at the Cterminus (Fig. 1B). The three domains are linked to each other by disulfide bonds. The hMSPL is activated by hydrolytic cleavage at Arg320-Ile321 and residues in the 321-581 region are converted to the mature SPD (5). Ile321 is located in a pocket where the N atom interacts with Asp505 (Fig. S1A). Therefore, this structure could represent a mature form in which hMSPL is processed by an intrinsic protease during expression in the cell. The LDLA domain of hMSPL is 24 residues in length and composed of two turns and a short α-helical region. A canonical LDLA domain has an N-terminal antiparallel β-sheet and three disulfide bonds (11). Therefore, LDLA of MSPL lacks half of the canonical N-terminal region. Since the SRCR domain of MSPL has only two disulfide bonds, it does not belongs to either the group A or B (19). Intriguingly, the 3D structures of the SRCR domains of MSPL and hepsin are very similar despite their low level of sequence homology (23% sequence identity), suggesting that the SRCR domain of MSPL belongs to group C (10).
To date, hepsin (PDB entry: 1P57) is the only protein in the same TTSP subfamily of proteins with an available 3D structure. However, hepsin lacks the LDLA domain. Here, we compared the structures of hMSPL and hepsin ( There are only three residues between the transmembrane domain and the N-terminal Thr188 residue of our structural model. Hence, the extracellular region of MSPL must be located very close to the plasma membrane. Indeed, the region that was predicted to be close to the plasma membrane is enriched in basic residues, such as Arg191, Lys193, Lys 213, Lys 215, and Arg556 (Fig. 2C). The extracellular region of hepsin is also thought to lie flat against the plasma membrane (17). Hence, both MSPL and hepsin may bind substrate in close proximity to the transmembrane region. However, the extracellular region of MSPL is oriented in the opposite way with respect to that of hepsin.

Interaction of the furin inhibitor (decanoyl-RVKR-cmk) with the MSPL active site
As expected, the SPD of MSPL displays the conserved architecture of the trypsin-and chymotrypsin-like (S1 family) serine proteases (Fig. 1B). In the activated MSPL, Ile321 at the cleavage site forms a salt bridge with the conserved Asp505 residue located immediately prior to the catalytic Ser506 residue (Fig. S1A). This interaction might be generated by the activating cleavage. Formation of the S1 pocket and oxyanion hole comes about via a conformational change in the nearby hairpin loop (Fig. 3). This salt bridge was also observed in other proteases such as plasma kallikrein (20) (PDB entry: 1Z8G) and hepsin. A furin inhibitor peptide binds to the SPD of MSPL with P1-Arg, P2-Lys, C-terminal cmk (chloromethylketone; an active site-direct group) and N-terminal decanoyl group (Fig. 1C, 3). Covalent interaction between the furin inhibitor and catalytic residues (His361, Ser506) occurs via nucleophilic attack on the cmk moiety. P1-Arg inserts into the deep S1 pocket, and its carbonyl oxygen atom directly binds to the backbone amides of the oxyanion hole (Gly504 and Ser506). In addition, the guanidino group of P1-Arg forms a salt bridge with the side chain of Asp500, as well as a hydrogen bond with the side chain of Ser501 and the backbone carbonyl of Gly529.
Asp500 is located in the bottom of S1 pocket. These residues are highly conserved among the hepsin/TMPRSS subfamily (Fig. 4). The interaction between P1-Arg and MSPL is characteristic of trypsin-and chymotrypsin-like serine proteases. However, P2-Lys interacts with residues at the so-called 99-loop (chymotrypsinogen numbering) that contains the catalytic residue Asp409. The Nζ of P2-Lys forms five hydrogen bonds with the backbones of Asp403 and Glu405, the side chains of Tyr401 and Asp406 and a water molecule. This water molecule also mediates hydrogen bonds with the side chains of Asp406 and the catalytic Asp409 residue. Interestingly, with the exception of catalytic Asp409, residues that interact with the side chain of P2-Lys are not conserved among the hepsin/TMPRSS subfamily (Fig. 4, cyan dot). Indeed, this may explain why other TMPRSSs and hepsin recognize the single basic motif but not the di-basic motif.

Comparison of the binding mechanisms of furin inhibitor peptide to MSPL and furin
The crystal structure of the furin inhibitor in complex with mouse furin has been determined (21). Although furin also has the same Ser-His-Asp catalytic triad as MSPL, its catalytic domain belongs to the superfamily of subtilisin-like serine proteases (22).
The catalytic domain of furin has a different overall fold from that of MSPL, which belongs to the trypsin-and chymotrypsin-like (S1 family) serine protease family.
Despite the different overall fold of MSPL and furin, the inhibitor peptide (decanoyl-RVKR-cmk) can bind to both enzymes. Therefore, we compared the structure of the MSPL-bound furin inhibitor with that of the furin-bound inhibitor (Fig. 5). Except for the P1-Arg, they are not superimposed. In the MSPL:furin inhibitor complex structure, the inhibitor exhibits a bend at the P3-Val. By contrast, in the furin:furin inhibitor complex structure, the inhibitor adopts an extended conformation. As a consequence, the P1, P2, and P4 site contacts with furin, whereas the P3 site is directed away from it.
Nonetheless, structural differences between furin and MSPL do not prevent the inhibitor from binding to both proteins.

Homology model analysis of TMPRSS2
In 2020, the SARS-CoV-2 pandemic killed over 0.5 million people (https://ourworldindata.org/covid-deaths) and resulted in a worldwide recession as people were forced to socially distance. In the early stage of infection, the spike protein of SARS-CoV-2 is cleaved by human TMPRSS2, and converted to the infectious form (23,24). To date, the structure of TMPRSS2 has not been reported. To investigate the structural features of TMPRSS2, we constructed a homology model (Fig. 6) using MSPL as template. Eight out of nine disulfide bonds are conserved (Fig. 4), and the relative domain alignment of TMPRSS2 is similar to that of MSPL. However, the SPD domain, specifically the β12-β13 loop region, displays significant differences (Fig. 6).
These structural changes result in a wide substrate-binding groove, so that TMPRSS2 may more readily capture the target peptide. Furthermore, Glu404, an important residue for P2-Lys recognition in MSPL, is replaced by Lys225 in TMPRSS2 (Fig. 4, 6B).
As mentioned earlier, this substitution leads to a preference for the monobasic target of TMPRSS2. In fact, the S1/S2 cleavage site of SARS-CoV-2 spike protein is reported as P2-Ala instead of a basic residue (25,26,27). In summary, our homology model reflects the features of TMPRSS2 target peptide recognition.

Implications for autosomal recessive nonsyndromic deafness
Our structure also helps to predict the tertiary structure of TMPRSS3, the gene responsible for autosomal recessive nonsyndromic deafness. Mutations identified in patients with this syndrome were mapped onto a homology model of TMPRSS3 to better understand the disease. Seven missense TMPRSS3 mutants (D103G, R109W, C194F, R216L, W251C, P404L and C407R) associated with deafness in humans were unable to activate the ENaC (28,29). One of seven missense mutants associated with the loss of hearing, D103G, was found in the LDLA domain of TMPRSS3 (28,30).
Because Asp103 in TMPRSS3 corresponds to Asp221 in MSPL, the LDLA structure stabilized by calcium-binding may be important for the function of the protein. Indeed, the mutations in LDLA and SRCR (D103G, R109W and C194F) as well as the SPD domains of TMPRSS3 affect its autoactivation by proteolytic cleavage at the junction site between the SRCR and the SPD domains (30).

Conclusion
In this study, we have elucidated the structure of the extracellular domain of MSPL and its spatial arrangement of three (LDLA, SRCR, and SPD) domains, as well as the substrate sequence specificity of MSPL. These findings will be useful in designing novel anti-influenza drugs that prevent HAPI virus uptake into the host cell. MSPL also contributes to the cleavage and activation of severe acute respiratory syndrome coronavirus (SARS-CoV) Middle East respiratory syndrome coronavirus (MERS-CoV) spike proteins (4).

Cloning, expression, and purification
Soluble recombinant hMSPL was generated using a previously established stable cell line expressing hMSPL (5), which accumulated in serum-free culture medium (SFCM).

Complex formation, crystallization, and data collection
The peptide inhibitor (decanoyl-RVKR-cmk) was purchased from Merck-Millipore and  Table S1.

Structure determination and refinement of the MSPL-inhibitor peptide complex
The structure of the complex was solved by the molecular replacement method using the program MolRep (33), with SPD of human plasma kallikrein (PDB code: 2ANY), which shows the highest sequence identity score (46.1%), as a search model. The model of SPD was manually fixed with COOT (34) and refined with Refmac5 (35). Once the SPD of MSPL was well refined, interpretable electron density of the unmodeled region was evident. The model of the LDLA and SRCR domains was then manually built.
The final model contained one MSPL, one furin inhibitor, four sugars, 80 ions, and 65 water molecules, with R-work and R-free values of 18.5% and 25.1%, respectively. The refinement statistics are summarized in Table S1. In the MSPL-peptide inhibitor complex, some residues (N-terminal 3xFLAG-tag and His187, Gly319, Arg320, and Cterminal Thr559-Val 581) are missing due to disorder. All the structures in the figures were prepared using PyMOL (http://www.pymol.org/). The MSPL/peptide inhibitor interfaces were analyzed using LIGPLOT (36).

Homology modelling of TMPRSS2
The sequence alignment of the extracellular region of MSPL and TMPRSS2 was obtained using the BLAST webserver (https://www.uniprot.org/blast/). The amino acid identity between MSPL and TMPRSS2 was 39.8% with a score of 704, and E-value of 1.1e-86. The homology model of TMPRSS2 was build using MODELLER (37).

Data availability
The coordinates and structure factors of the MSPL-peptide inhibitor complex have been deposited to the RCSB Protein Data Bank (PDB code: 6KD5).

Author Contributions
The authors have jointly contributed to project design, data analysis, and manuscript       (B) Electrostatic surface potential of MSPL and TMPRSS2 SPD. MSPL has a narrow groove that fits with the downstream peptide chain (green arrow). By comparison, in TMPRSS2 the groove is significantly wider, and the peptide binding site is bowlshaped (cyan oval A). A positively-charged area derived from Lys225 is indicated in green oval B. The potential map is colored from red (-5kT/e) to blue (+5kT/e).