Exploratory meta-analysis of hypoxic transcriptomes using a precise transcript reference sequence set

Exploratory meta-analysis of hypoxic RNA-seq data using FANTOM-CAT as a reference transcriptome facilitates the evaluation of relationship between sense–antisense transcripts from the same locus.

1. In this study, hypoxia-responsive genes were identified by meta-analysis. However, the hypoxia-responsive genes might be changed in a tissue-dependent manner. Would it be possible to show the hypoxia-related genes based on tissue types or cell types? 2. It is unclear the reason to show "Survey of the number of papers referring FANTOM". 3. As shown in Figure 4, some sense-antisense pairs expressed oppositely. Is there any explanation for this phenotype? Have the authors examined chromatin state by ATAC-seq or ChIP-seq analysis? It would be more informative to show chromatin status based on the classified four categories. 4. It may be switched the position of Figure 2A and 2B because Figure 2B was explained first in the text. 5. Figure 2C and 2D did not appear in the text.
Reviewer #2 (Comments to the Authors (Required)): In this report, Ono et al investigated Hypoxia RNA-seq datasets and evaluated the effects on non-coding RNAs using integration with Fantom capabilities. They identify a significant level of hypoxia on the expression of non-coding RNAs, mostly the antisense variety. In addition, they identified how their regulations compares to their sense version. This study adds new insights into the mechanisms controlled by oxygen reduction in the cell. The data is robust and signficant. However, my only concern is the lack of discussion on how these non-coding RNAs are influencing cellular response. In the anti-sense RNAs that share same control as their sense conter-part is there a known function? RNA-decay, chromatin organisation, and specific sequence or motif that is over represented? Also, I assume these are shared amongst all the RNA-seq experiment analysed or are they only present in a subset? If so, is there a tissue specificity or cell background specificity?
Reviewer #3 (Comments to the Authors (Required)): The manuscript entitled "Exploratory Meta-Analysis of Hypoxic Transcriptomes Using a Precise Transcript Reference Sequence Set" by Ono et al, describes the characterization of the transcriptional response to hypoxia both at the level of coding and noncoding genes. To this end, the authors perform two analyses, one using RNAseq data quantified using the GENCODE transcriptome reference, and then a second one using the expanded meta-transcriptome assembly produce by the FANTOM consortium (the FANTOM-CAT reference), which accounts for a large collection of genes, encompassing both coding and noncoding, including many that are novel. In this study, the authors build upon their previous meta-analysis of the expression response to hypoxia, which revealed that many genes related to processing non-coding RNA were among those modulated by hypoxia. First the authors confirm their previous findings using coding genes from GENCODE, identifying pathways and processes downstream hypoxia, then they repeat the analysis with the FANTOM-CAT reference, quantifying the ncRNA associated with hypoxia, many of which are anti-sense transcripts of known coding genes.
This work is mostly descriptive and there are several aspects that need improvement, especially in terms of quantitative versus qualitative analyses, as explained below. Furthermore, the use of the FANTOM-CAT transcriptome to quantify gene expression using data from the public domain is not novel (see PMID:32079618, "Recounting the FANTOM CAGE-Associated Transcriptome" by Imada et al). Such FANTOM-CAT-recount2 gene expression atlas has been already successfully used, for instance, to quantify gene expression in prostate cancer (PMID:34311724, "Transcriptional landscape of PTEN loss in primary prostate cancer" by Imada et al), and to assist assigning a potential function to novel lncRNAs by the FANTOM6 consortium (PMID:32718982, "Functional annotation of human long noncoding RNAs via molecular phenotyping", by Ramilowski et al). The authors should include these citations and acknowledge previous work in the field.
Page 9, lines 137-139, Content: "This investigation revealed the fact that there were several cases where hypoxia-responsive transcripts with high HNf-scores were located within the antisense regions of genes with high HNg-scores (BHLHE40-AS1, SLC2A1-AS1)." Comment: Does this mean that both sense and anti-sense genes are affected by hypoxia? If so, is this observed more often than one would expect by chance? Is this observation only qualitative? Can the authors perform a more quantitative analysis on this point?
Page 12, lines 159-161, Content: "Further evaluation also revealed that the Top 10 antisense transcripts in the UP200 gene list were dominated by both short and long ncRNAs (lncRNAs), while the bottom 10 transcripts from this group were shown to be predominantly mRNA transcripts." Comment: Is this different from what one would be expecting by chance alone? In other words, is there a way to test if this is relevant and not just anedoctal? See my previous comment about this above.
Page 12, lines 162-170, Content: "Interestingly, among the antisense transcripts from the DOWN200 gene set, phosphoglycerate kinase 1 (PGK1), which was also included in the UP200 gene set, had the highest HNf-score. Given this, we homed in on PGK1 and its antisense TATA-box binding protein associated factor 9b (TAF9B), evaluating their chromosomal localization using UCSC Genome Browser and HN-score at the gene level (Fig. 4B, C). This analysis revealed that while these two genes are not described as having a sense-antisense relationship when evaluated using the UCSC or GENCODE datasets, FANTOM-CAT-based analysis suggests that these two transcripts are actually in a sense-antisense relationship. By using FANTOM-CAT as a reference, we were able to present the possibility that these two genes affect the regulation of expression." Comment: It is not clear the significance of this finding. Why is this important? Isn't this to be expected given that the ANTOM-CAT meta-transcriptome contains more genes, both coding and non-coding?
Pages 16-17, lines 195-197, Content: "and then we expanded this evaluation to any of the hypoxia-responsive genes annotated in the FANTOM-CAT database, which includes both coding and non-coding elements ( Fig. 1)" Comment: It is not clear why the author decided to repeat the analysis twice, and not just once with the FANTOM-CAT reference, which contains both coding and non-coding genes, and encompasses GENCODE. A better approach would be to perform the analysis just once, with the FANTOM-CAT, splitting coding and non-coding genes.
Page 17, lines 198-199, Content: "We believe that these data sets and this analysis approach will help to further elucidate the mechanisms regulating the hypoxic response." Comment: Would it be possible to go a little further here and try to annotate these genes looking at other conditions for instance tissue expression in gtex, on other date sets... The FANTOM-CAT-recount2 gene expression atlas would be ideal for this analysis.
Page 18, lines 224-225, Content: "Therefore, we assumed that this dataset is unlikely to be commonly applied as a reference point for RNA-seq." Comment: This is not true, it has been used already multiple times, see my comment above.
Page 20, lines 262-264, Content: "This revealed that 75% of the UP and DOWN regulated antisense transcripts presented with a similar pattern of expression to their sense partner, while the other 25% of antisense transcripts were shown to be regulated in opposition to their sense partner (Fig. 4A)." Comment: Is this true only for the hypoxia genes or for all genes? This statement should be supported a by a formal quantitative analysis.
Finally, code and scripts used for preprocessing and analyze the gene expression data should be included in the interest of robustness, transparency, and reproducibility. The authors could use a gitHub repository or include the scripts as supplementary material.
1st Authors' Response to Reviewers September 8, 2022 The manuscript by Ono and Bono describes a novel approach to exploring hypoxia-responsive genes by meta-analysis. Using two reference datasets such as GENCODE and FANTOM-CAT, top and bottom genes were identified based on HNg-score, and enrichment analysis showed enriched GO in top and bottom genes. Using FANTOM-CAT data, HNg-score was evaluated, and the relationship between antisense and sense genes in downand up-regulated genes by hypoxia. The approach is interesting because this could be robust and novel; however, there are several concerns before considering further process. Specific comments are shown below.
1. In this study, hypoxia-responsive genes were identified by meta-analysis. However, the hypoxia-responsive genes might be changed in a tissue-dependent manner. Would it be possible to show the hypoxia-related genes based on tissue types or cell types? 5. Figure 2C and 2D did not appear in the text.
Response1-5: Thank you for pointing this out. We have included these (now Figure 3C and 3D) in the text of the relevant section. "In this enrichment analysis, the ncRNA metabolic process was further subdivided into the following Gene Ontology (GO) terms: ncRNA processing, ribosome biogenesis, ribonucleoprotein complex biogenesis, rRNA processing, and rRNA metabolic process (Fig. 3C). The results of rRNA GO network analysis were visualized in Upset plots, which showed that the ncRNA metabolic process is heavily reliant on most of these rRNA metabolic genes (Fig. 3D)." (Lines 121-125) Reviewer #2 (Comments to the Authors (Required)): In this report, Ono et al investigated Hypoxia RNA-seq datasets and evaluated the effects on non-coding RNAs using integration with Fantom capabilities. They identify a significant level of hypoxia on the expression of non-coding RNAs, mostly the antisense variety. In addition, they identified how their regulations compares to their sense version. This study adds new insights into the mechanisms controlled by oxygen reduction in the cell. The data is robust and signficant. However, my only concern is the lack of discussion on how these non-coding RNAs are influencing cellular response.
Response2-1: Thank you for your suggestion. We could not clarify the intracellular response of non-coding RNAs (ncRNAs), but it is worthwhile to evaluate the hypoxic response of transcripts including ncRNAs comprehensively by using FANTOM-CAT as a reference sequence. Based on your suggestion, we have added the following. "We first evaluated the ncRNA metabolic process and provided a dataset that will help generate hypotheses for clarifying specific ncRNA intracellular responses. "(Lines 319-320) In the anti-sense RNAs that share same control as their sense conter-part is there a known function? RNA-decay, chromatin organisation, and specific sequence or motif that is over represented?
Response2-2: Several functions are known to be regulated in the same direction. We conditions. As you can see, there was no specificity in a specific condition. We believe it depends on oxygen concentration, treatment time, and cell type.
Reviewer #3 (Comments to the Authors (Required)): The manuscript entitled "Exploratory Meta-Analysis of Hypoxic Transcriptomes Using a Precise Transcript Reference Sequence Set" by Ono et al, describes the characterization of the transcriptional response to hypoxia both at the level of coding and non-coding genes. To this end, the authors perform two analyses, one using RNAseq data quantified using the GENCODE transcriptome reference, and then a second one using the expanded meta-transcriptome assembly produce by the FANTOM consortium (the FANTOM-CAT reference), which accounts for a large collection of genes, encompassing both coding and non-coding, including many that are novel. In this study, the authors build upon their previous meta-analysis of the expression response to hypoxia, which revealed that many genes related to processing non-coding RNA were among those modulated by hypoxia. First the authors confirm their previous findings using coding genes from GENCODE, identifying pathways and processes downstream hypoxia, then they repeat the analysis with the FANTOM-CAT reference, quantifying the ncRNA associated with hypoxia, many of which are anti-sense transcripts of known coding genes.
This work is mostly descriptive and there are several aspects that need improvement, especially in terms of quantitative versus qualitative analyses, as explained below. refers to the two genes only. The sentence correspondence was inappropriate, so the wording has been changed. "Number of transcripts derived from mitochondrial DNA and antisense analysis was reduced, suggesting a transcriptional regulation relationship in both sense and antisense genes (PGK1 and TAF9B). "(Lines 524-526) Page 6, lines 106-107, Content: "Enrichment analysis focusing only on protein-coding genes revealed that the top 200 genes identified by HNg-score, a measure of susceptibility to hypoxia, were enriched for hypoxia response-" Comment: This is to be expected. Can the authors i elaborate on the way this HN score is calculated? Also, is the correct abbreviation for the score HN, HNg,or HNf (see fig 1) Response3-2: The following text has been added for the clarification.
"From a sample of hypoxia-related datasets obtained from public databases, we selected sample pairs of hypoxia and normoxia (HN-pairs). Based on these HN-pairs, plus 1 was calculated for each HN-pair if the expression variation was greater than 1.5-fold compared to both and minus 1 if the variation was less than the reciprocal of 1.5-fold.
These values were summed for each gene and designated as the hypoxia-normoxia score (HN-score). The HN-score qualitatively reflects hypoxia-responsiveness, unlike the method used in general gene expression analysis. This HN-score was named as the HNg-score when the reference sequence was GENCODE and HNf-score when the sequence was based on FANTOM-CAT. We subjected 200 genes with high HNg-scores and 200 genes with low HNg-scores to enrichment analysis and evaluated whether the genes selected based on the HNg-score could be used to examine hypoxia responsiveness."(Lines 102-111) Page 7, lines 111-113, Content: "The ncRNA metabolic process was then further subdivided into the following gene ontology terms: ncRNA processing, ribosome biogenesis, ribonucleoprotein complex biogenesis, rRNA processing, and rRNA metabolic process." Comment: It look like this sub-division is based on gene ontology however it is not clear if the GO terms indicated here resulted from some formal analysis, or whether they were selected with on arbitrary approach. Please explain. Response3-6: Thank you for pointing this out. We added the following sentence to ensure that it was appropriately worded.
"FANTOM-CAT is a comprehensive catalog of human lncRNAs that improves upon existing lncRNA transcript models. The results obtained using this dataset have greatly contributed to life science research (Imada et al. 2020;Ramilowski et al. 2020;Imada et al. 2021). Despite the usefulness of this dataset, we predicted that it is used less frequently for RNA-sequencing (RNA-seq) quantification compared to GENCODE; we quantitatively examined this hypothesis. If FANTOM-CAT is used less frequently, results obtained using this database may show a lower influence of publication bias (Ono and Bono 2021), as demonstrated in our previous report."  Page 9, line 132, Content: "The HNf-score for each transcript was quantified using FANTOM-CAT as a reference." Comment: Was this done for all transcripts and genes in this reference transciptome, or only for the non-coding ones? Response3-7: We quantified for all transcripts. The text was changed as follows. Response3-9: Since the HNf-score was used as the index for evaluation, the results are not coincidental; the following was added to indicate that the analysis was based on the HNf-score. "We also investigated transcripts with high or low HNf-scores to qualitatively evaluate hypoxia responsiveness.."  Page 12, lines 162-170, Content: "Interestingly, among the antisense transcripts from the DOWN200 gene set, phosphoglycerate kinase 1 (PGK1), which was also included in the UP200 gene set, had the highest HNf-score. Given this, we homed in on PGK1 and its antisense TATA-box binding protein associated factor 9b (TAF9B), evaluating their chromosomal localization using UCSC Genome Browser and HN-score at the gene level (Fig. 4B, C). This analysis revealed that while these two genes are not described as having a sense-antisense relationship when evaluated using the UCSC or GENCODE datasets, FANTOM-CAT-based analysis suggests that these two transcripts are actually in a sense-antisense relationship. By using FANTOM-CAT as a reference, we were able to present the possibility that these two genes affect the regulation of expression." Comment: It is not clear the significance of this finding. Why is this important? Isn't this to be expected given that the FANTOM-CAT meta-transcriptome contains more genes, both coding and non-coding?
Response3-10: An interesting aspect of this analysis is that relationships not found by GENCODE or UCSC were only be revealed by utilizing FANTOM-CAT. We have changed the wording as follows.
"By using FANTOM-CAT as a reference, we were able to present for the first time the possibility that these two genes affect the regulation of expression."(Lines 187-188) Pages [16][17]Content: "and then we expanded this evaluation to any of the hypoxia-responsive genes annotated in the FANTOM-CAT database, which includes both coding and non-coding elements (Fig. 1)" Comment: It is not clear why the author decided to repeat the analysis twice, and not just once with the FANTOM-CAT reference, which contains both coding and non-coding genes, and encompasses GENCODE. A better approach would be to perform the analysis just once, with the FANTOM-CAT, splitting coding and non-coding genes.
Response3-11: As you pointed out, it would be ideal to perform the analysis using FANTOM-CAT from the beginning. However, we had already conducted our analysis focusing only on coding genes in our previous studies. Therefore, we decided to analyze FANTOM-CAT after carefully examining the results of GENCODE first. In order to accurately describe the results, we have described them as follows.
"Our previous studies focusing on coding genes have revealed a clear publication bias when evaluating gene expression changes in response to hypoxia and identified a novel set of largely uncharacterized hypoxia-responsive elements (Ono and Bono 2021) "  Page 17, lines 198-199, Content: "We believe that these data sets and this analysis approach will help to further elucidate the mechanisms regulating the hypoxic response." Comment: Would it be possible to go a little further here and try to annotate these genes looking at other conditions for instance tissue expression in gtex, on other date sets... The FANTOM-CAT-recount2 gene expression atlas would be ideal for this analysis. Page 18, lines 224-225, Content: "Therefore, we assumed that this dataset is unlikely to be commonly applied as a reference point for RNA-seq." Comment: This is not true, it has been used already multiple times, see my comment above.
Response3-13: The wording was changed as follows.
" As this dataset may not be commonly applied as a reference point for RNA-seq, we investigated the number of articles referencing both "RNA-seq" and "FANTOM" in PMC."  Page 20, lines 262-264, Content: "This revealed that 75% of the UP and DOWN regulated antisense transcripts presented with a similar pattern of expression to their sense partner, while the other 25% of antisense transcripts were shown to be regulated in opposition to their sense partner (Fig. 4A)." Comment: Is this true only for the hypoxia genes or for all genes? This statement should be supported a by a formal quantitative analysis.
Response3-14: The analysis is for all transcripts. The wording has been changed as follows.
"We then evaluated the HNf-score of all antisense transcripts of these genes."(Lines

283-284)
Finally, code and scripts used for preprocessing and analyze the gene expression data should be included in the interest of robustness, transparency, and reproducibility. The authors could use a gitHub repository or include the scripts as supplementary material.