Introduction

Coronavirus disease-19 (COVID-19) outbreak is caused by severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) at present has jolted the entire human race. COVID-19 pandemic has affected natives in more than 200 countries and territories located in different geographical locations and climatic conditions. According to World Health Organization (WHO), nearly 3.8 million individuals have already been infected globally and more than 260 thousands have died till date (WHO COVID-19 Situation Report-109, 08/5/2020). Recent trend indicates much lower overall case fatality rate (CFR) in COVID-19 pandemic (4.58%) compared to outbreak of SARS (11%) and Middle East respiratory syndrome (MERS) (35%) (Peeri et al. 2020). The incidence and CFR varies drastically in different populations, where the present global CFR (6.9%) is significantly lower than Europe (9.2%) and considerably higher than Africa (3.4%), East Mediterranean and Southeast Asia (3.6%), Western Pacific (4.1%) and America (5.5%) (WHO COVID-19 Situation Report-109, 08/5/2020). Recent epidemiological data showed the association of hypertension, and type 2 diabetes with the incidence and case fatality of SARS-CoV-2 infection (Huang et al. 2020; Wu et al. 2020).

Spike protein (S-protein) of coronaviruses interacts with specific membrane receptors for their entry into the human cells. S-protein of SARS-CoV-2 has conserved tertiary folds that are involved in interaction with human receptors, but has limited sequence similarity with previously known coronaviruses (such as SARS-CoV, MERS-CoV) (Xu et al. 2020). Multiple host receptors, such as angiotensin converting enzyme 2 (ACE2), transmembrane serine protease 2 (TMPRSS2) and dipeptidyl peptidase-4, DPP4 (CD26) are known to help in priming of S protein and subsequent entry of virus particle in the host cell (Li et al. 2005; Hoffmann et al. 2016; Cascella et al. 2020; Ibrahim et al. 2020; Vankadari and Wilce 2020). However, specific host cell factor that facilitate SARS-CoV-2 entry into the human cell is still elusive. Structural or expression variations in these cellular receptors (host factor) could influence the cell adhesion, cellular entry and virulence of SARS-CoV-2. Population level differences in rate of incidence and CFR could be attributed by the functional genetic variations in these receptors. Therefore, detailed genetic analyses of these receptors are warranted for future epidemiological, molecular and pharmaceutical research to tackle SARS-CoV-2 or related coronavirus outbreak.

A recent comparative genetic analysis of ACE2 receptor was unable to confirm population specific resistance to SARS-CoV-2 infection, but showed promise for further investigation (Cao et al. 2020). Here for the first time we report a comparative genetic study on TMPRSS2 and CD26, and their predicted molecular interaction with SARS-CoV-2 spike protein.

Methods and results

Common genetic variants localized in the coding (nonsynonymous) and regulatory regions of TMPRSS2 and CD26 were evaluated. Genotypes of these variants in 26 populations from five major global regions were obtained from 1000 Genomes database (https://www.internationalgenome.org) and used for comparative analysis. A common missense variant rs12329760: C>T (Chr21:41480570), was found to have noticeable variations in allele frequency in different populations (table 1 in electronic supplementary material). Highest frequency of minor T allele was observed among Chinese (CHB = 0.41 and CHS = 0.38) and Japanese (JPT = 0.39) populations. Considerably lower allelic frequency was observed in other major populations (table 1 in electronic supplementary material). Regional linkage disequlibrium (LD) analyses showed the presence of very close and high range LD and haplotype patterns in CHS compared to other populations (figure 1, a‒e in electronic supplementary material). Fourteen common (allelic freq > 0.01) regulatory variants are present at the regulatory regions of TMPRSS2. Four of these variants (rs112657409, rs11910678, rs77675406 and rs713400) are independent of the nonsynonymous SNP rs12329760 and have significant eQTL effects on MX1 and TMPRSS2 in different tissues (table 2 in electronic supplementary material). GTEx database was used to evaluate the eQTL (Aguet et al. 2019). MX1 encodes a GTP metabolizing protein which participates in cellular antiviral response by antagonizing the replication of several RNA and DNA viruses (Jung et al. 2019). Region 5′-flanking of TMPRSS2 harbours SNP rs713400 beyond a CpG island (CG:156) and was found to influences the expression of TMPRSS2 (table 2 in electronic supplementary material). rs713400-T allelic frequency is considerably high among east Asians (avg_freq = 0.28) compared to Europeans (avg_freq = 0.11), South Asians (avg_freq = 0.13), Africans (avg_freq = 0.02) and Americans (avg_freq = 0.11) (table 2 in electronic supplementary material).

Figure 1
figure 1

(a) Docking results showing protein‒protein interaction between SARS-CoV-2 (yellow) (PDB:6VSB) and human TMPRSS2 modelled protein (cyan); (b) ribbon and surface structure diagram showing magnified protein–protein (SARS-CoV-2 and TMPRSS2) interaction region; (c) surface diagram showing magnified protein–protein (SARS-CoV-2 and TMPRSS2) interaction region with interacting amino acid residues. SARS-CoV-2 amino acid back bone is shown in yellow colour and TMPRSS2 surface structure is depicted in cyan colour.

Another nonsynonymous variations rs1129599 in CD26 with moderate effects on protein structure/function was found to be population specific (table 1 in electronic supplementary material). This relatively rare variation is only present among Africans (avg_freq = 0.04) and absolutely monomorphic in rest of the world. Like TMPRSS2, variants around this region of CD26 have very high degree of short range LD among southern Han Chinese population (CHS), which is otherwise absent in other major populations (figure 1, f‒j in electronic supplementary material).

Regulatory SNP rs13015258 (G>T) at the exon 1 start site (chr2:162930725) of CD26 was found to have significant eQTL effect (P = 2.50E−07) on the expression of CD26 in lung tissue (table 2 in electronic supplementary material). This site falls within a 646 nucleotide long CpG island (figure 2 in electronic supplementary material) and hypermethylation of C allele (complementary to G) of this SNP was identified to significantly (P = 0.001) repress the expression of CD26 in human visceral adipose tissue (Turcot et al. 2011). Presence of C allele further promotes the binding of several transcription factors (table 2 in electronic supplementary material) and regulates the self-expression. CD26 is a cell-surface protease expresses in variety of tissues including specific sets of T-cells, adipose tissues, endothelial and epithelial cells. Its soluble form is also present in plasma and body fluid. Its role in glucose metabolism is well established. Significantly higher expression of CD26 was found associated with type 2 diabetes (Qiao et al. 2019). Frequency of G/T allele of rs13015258 varies considerably among different populations, where G is the major allele among Africans, Europeans and South Asians; but minor allele for Americans and East Asians (table 2 in electronic supplementary material).

Figure 2
figure 2

(a) Surface structure showing protein–protein interaction between SARS-CoV-2 (grey) (PDB: 6VSB) and human CD26 (orange) (PDB: 4QZV); (b) ribbon diagram showing magnified protein–protein (SARS-CoV-2 and CD26) interaction region; (c) surface diagram showing magnified protein–protein (SARS-CoV-2 and CD26) interaction region with interacting amino acid residues.

Recently, Vankadari and Wilce (2020) predicted the homo-trimer structure of SARS-CoV-2 spike glycoprotein (modelled protein) and its interaction with human CD26 protein (crystalized). We hereby, for the first time report the computer-based interaction of crystalized SARS-CoV-2 spike glycoprotein with human CD26 protein. Moreover, the study also reports protein‒protein interaction between SARS-CoV-2 spike glycoprotein and TMPRSS2 modelled protein.

PDB structure of SARS-CoV-2 spike glycoprotein (PDB:6VSB) of coronavirus and human CD26 receptor (PDB:4QZV) were retrieved from the Protein Data Bank (PDB). Amino acid sequence of human TMPRSS2 protein was retrieved from Uniprot Database. Three dimensional structure of TMPRSS2 protein was modelled using Swiss-model structural bioinformatics server followed by refinement of the structure using ModRefiner sever (figure 3, a‒c in electronic supplementary material). In human, the type-II transmembrane serine proteases (11 in number) are divided into four subfamilies. Hespin (also known as TMPRSS1) and TMPRSS2 belong to the same subfamily (Hespin/TMPRSS) and share similar type of extracellular carboxy-terminus protease domain (Sakai et al. 2014; Mukai et al. 2020). Thus, serine protease Hepsin (PDB:5CEL) was used as template for the TMPRSS2 protein homology modelling. Protein‒protein docking of SARS-CoV-2 spike glycoprotein with human TMPRSS2 modelled and CD26 crystalized protein was performed using ClusPRO server (Kozakov et al. 2017). LigPlot+ v2.2 software was used to find the type of interaction among interacting proteins (Wallace et al. 1995). Structure representation and visualization was performed by using PyMol software (DeLano 2002). Docking of SARS-CoV-2 spike glycoprotein with TMPRSS2 modelled protein (figure 1), and CD26 (figure 2) showed a large interface between the proteins. Amino acid residues involved in these interactions can be found in table 3 in electronic supplementary material. None of the missense variants, namely rs12329760 of TMPRSS2 and rs1129599 of CD26 were found directly engaged in the protein‒protein interaction with S1 domain of the viral spike protein. However, their indirect role in influencing this protein‒protein interaction is beyond the scope of this study.

Discussion

Following the COVID-19 outbreak, this is the first report on the assessment of genetic susceptibility of TMPRSS2 and CD26 (DPP4) for the SARS-CoV-2 infection. Based on the in silico prediction, in this study we also confirmed the molecular interactions between SARS-CoV-2 spike protein and human TMPRSS2 and CD26/DPP4. This study highlighted the differential allelic frequencies of two common missense variations from TMPRSS2 (rs12329760) and CD26 (rs1129599) in different global populations. Noticeable LD differences in these loci probably indicated presence of different haplotypes that could influence the overall receptor function. These two SNPs are not located within the receptor-ligand (S1 domain of SARS-CoV-2) binding site. Further study is warranted to find their effect on protein‒protein interaction (PPI) dynamics, protein structure stability and turnover. Four regulatory SNPs from TMPRSS2 (rs112657409, rs11910678, rs77675406 and rs713400) and one from CD26 (rs13015258) have significant role in regulation of expression of key regulatory genes (TMPRSS2, CD26 and MX1) that could be involved in SARS-CoV-2 infection. Epigenetic modification at rs13015258-C allele induces CD26 overexpression which could explain the higher SARS-CoV-2 infected fatality rate among type 2 diabetes. Preliminary in silico predictions of interactions between TMRSS2 and CD26 with SARS-CoV-2 S-protein need to be confirmed by detailed molecular experiments. Findings from this study would guide further genetic epidemiological study and drug development or drug repurposing to tackle COVID-19.