Phylo-geo-network and haplogroup analysis of 611 novel coronavirus (SARS-CoV-2) genomes from India

Evolution of SARS-CoV-2 in India across 51 haplogroups based on 152 parsimony informative sites revealed B6 and B1 (Pangolin) and A2a (Covidex) as the most prevalent lineage and clade, respectively.


Introduction
Page 3 Line 1 and 2: This is true. However, as this experiment was not carried out in the present study, a suitable and appropriate reference should be provided to substantiate this Line 5: You can kindly include a reference to include the size of SARS-CoV-2 (nCoV-19 in Authors' words) Line 9: Kindly update this statistic (28th August is a bit obsolete due to the increasing burden of COVID-19). Line 10: It is pertinent to give the full meaning of abbreviations at first mention, kindly look into this as this was a common trend throughout the manuscript (and even the abstract). (NCBI, MSA, GISAID, WHO, etc) Line 10-12: Revise this and you also need to update the data. You gave a statistic report from worldometer and acknowledged WHO. In case, you'd prefer WHO data; you can access that via covid19.who.int Line 15-18: This should definitely be reviewed. How come articles published in 2004 (Peiris et al) and 2012 (Zaki et al) would serve as references for symptoms observed in nCoV-2019? Definitely not possible Line 21-23: Somehow clumsy for me to understand. Revise this claim or provide a suitable reference Page 4 Line 6-8: You are right about this. There is an increased burden of COVID-19 in India and most parts of the world. Nevertheless, I will appreciate optimism. With the heroic effort of researchers across the globe (including you), vaccines are being produced in large volumes. More so, despite the demography of India, adherence to preventive guidelines will go a long way in curtailing the menace of COVID-19. Materials and Methods Sequence acquisition In the first sentence, you're stated to have mined sequences both from NCBI and GISAID but in the latter sentences, nothing was made mention of data filter on NCBI, how many sequences were gotten from NCBI and GISAID respectively, and the identifications of sequences mined from NCBI SARS-CoV-2 database. Likewise, the link you provided for NCBI is not the designated link for SARS-CoV-2. All these should be clearly stated for clarity and to improve the reproducibility of your method. It is also a great concern for me since these 611 sequences were derived on 6th June 2020, which is already more than 6 months. I hope the claims would still be relevant as of when this study was done due to the ever-changing dynamics of the SARS-CoV-2. Lineage and Subtyping Analysis Kindly revise line 1 and 2 to aid clarity. Results and Discussion Phylogenetic network analysis Line 1: The alignment of genomes (include number of genomes; 611 and name of the organism; nCoV-2019) Page 7 Line 4 Can you clearly state accession no as accession number Table 2 S/No 3 should refer to ORF1b and not ORF1ab, since S/No 2 already highlighted the PI sites in ORF1a Under the genome region (column 2), since you are referring to base pairs and nucleotide positions, you should refer to SARS-CoV-2 genes instead of proteins. More so, ensure genes are in italics when this is revised. We know of nucleocapsid, can you differentiate what are N and NC (as stated in column 2)? Kindly make this clear Lineage and Subtype Analysis Line 3-8: can you substantiate this with reference(s) highlighting the index cases of COVID-19 in India from the countries stated? This can as well serve as supporting information for the phylogenetic lineage Conclusion Line 1-2: Revise, this is not clear Line 3: The strain or variant most prevalent in India is more appropriate Reviewer #2 (Comments to the Authors (Required)): In the manuscript "Phylo-geo-network and haplogroup analysis of 611 novel Coronavirus (nCov-2019) genomes from India" Laskar and Ali analysed the phylo-geo-network of SARS-coV2 genomes to understand virus evolution in different geographical regions of India. The analysis of rapidly evolving viruses is very important to understand the evolution and geographical distribution of different virus variants. In this study, the authors extracted 611 full genomic sequences of SARS-coV2 from the different states of India. First genomic sequence alignment leads to identify 270 parsimony informative sites. second network analysis discovered that reference sequence NC_045512.2(Wuhan, China) forms the core haplogroup with 157 identical sequences present across 16 states of India. Further, in the comparative analysis of haplogroups, the authors observed local evolution of sars-coV2 genomes. Lastly, the data shows that B6 and B1 are the two most common lineages whereas the strains in A2a clade appears to be the most predominant in India.
Comments: 1. Indian territories are very diverse in terms of geographical conditions. Are differences in the haplogroups distribution in different states somehow linked to varying geographical conditions or is/are there some other reasons. 2. Does heterogeneity in haplogroups distribution in different states depends on the number of sequences analysed from each state? It would be interesting to know the distribution if same number of genomes are analysed from each state. 3. A variant of SARS-CoV-2 with a D614G mutation in the gene encoding the spike protein emerged in the beginning of 2020. After a couple of months, the D614G variant became dominant over initial SARS-CoV-2 strain originally identified in Wuhan, China. Have the authors detected the evolution/mutation of D614G spike variant in India? If yes, what is the level of distribution of D614G variants/mutants in different states of India? 4. Recently, a new variant of SARS-CoV2-called as VUI 202012/01, has been identified through viral genomic sequencing in the United Kingdom (UK). Its genome harbours multiple mutations (deletion 69-70, deletion 144, N501Y, A570D, P681H, T716I, S982A, D1118H) in the spike coding gene. Genomic sequence analysis revealed that currently the increase in SARS-coV2 cases in UK are associated with the VUI 202012/01 variant. Now, this VUI 202012/01 SARS-CoV2 variant is not only present in UK but also small numbers of cases detected in other countries including in India. It would be intriguing to know what haplogroup this variant belongs to and I suggest the authors to include this data in the revised manuscript.
The findings are a novel contribution to the existing knowledge about Phylo-geo-network analysis of SARS-coV2 genomes across the different states of India. Overall, the present manuscript is well conceived, planned and executed. However, there are few minor concerns which must be addressed to further improve the quality of the manuscript.
Reviewer #3 (Comments to the Authors (Required)): Overall, I find this paper to be an interesting addition to the current COVID-19 literature, especially as it focuses on India. Importantly this paper highlights local viral evolution and low overall genome evolution in relation to the Wuhan Genome.
However, I have the following comments. 1. The time when the genomes were obtained and analysed should be emphasized. 2. It should be emphasized that some of the language is a tad too simplistic in some paragraphs. For example, some lines of the abstract, the introduction and the results/discussion sections. 3. Some sentences are not clear in both context and structure. There are also minor grammatical errors and tense mistakes that create some confusion with understanding the work that was done. 4. The legend for figure 1 needs to be clearer, including the grammar. Actually, all figure legends should be rewritten to be clearer and easier to follow. 5. An explanation of the rationale behind the choice of methods is lacking. To fix this, I suggest that the result and discussion sections be separated, and rationale behind methods be explained in more depth in the discussion section. Provided with the revised manuscript.

Reviewers Comments [RC] [AR]
This present study examined the phylogenomic network of nCov-2019 in India identifying the common haplogroups and lineage of the virus in the study population.
My major concern is the use of data generated since June 2020, having in mind the high mutation rate of SARS-  (Figure 3) for sequence extraction has also been provided in the revised manuscript.
It is also a great concern for me since these 611 sequences were derived on 6th June 2020, which is already more than 6 months. I hope the claims would still be relevant as of when this study was done due to the everchanging dynamics of the SARS-CoV-2.
We agree with constant the accrual of mutations in SARS-CoV-2 and would be updating the data presented herein as short report/update as per journal norms at a later stage.

Lineage and Subtyping Analysis
Kindly revise line 1 and 2 to aid clarity.
Edited in the revised manuscript.

Results and Discussion
Phylogenetic network analysis   Table 3.

Conclusion
Line 1-2: Revise, this is not clear Line 3: The strain or variant most prevalent in India is more appropriate Edited in the revised manuscript.

Reviewers Comments [RC] Authors Response [AR]
In the manuscript … to be the most predominant in India.
We thank the reviewer for a positive summary of our work. Overall, the present manuscript is well conceived, planned and executed.
However, there are few minor concerns which must be addressed to further improve the quality of the manuscript.
We thank the reviewer for the positive remarks and have addressed all the issues raised.

Reviewers Comments [RC] Authors Response [AR]
Overall, I find this paper to be an interesting addition to the current COVID-19 literature, especially as it focuses on India. Importantly this paper highlights local viral evolution and low overall genome evolution in relation to the Wuhan Genome.
We thank the reviewer for a positive summary of our work. Actually, all figure legends should be rewritten to be clearer and easier to follow.
Legends have been revised accordingly.
5. An explanation of the rationale behind the choice of methods is lacking.
To fix this, I suggest that the result and discussion sections be separated, and rationale behind methods be explained in more depth in the discussion section.
Results and Discussion are presented as separate sections in the revised manuscript and rationale behind methods included in discussion.