Comprehensive RNA-seq transcriptomic profiling in the malignant progression of gliomas

Zhao, Zheng; Meng, Fanlin; Wang, Wen; Wang, Zheng; Zhang, Chuanbao; Jiang, Tao

doi:10.1038/sdata.2017.24

Download PDF

Data Descriptor
Open access
Published: 14 March 2017

Comprehensive RNA-seq transcriptomic profiling in the malignant progression of gliomas

Zheng Zhao¹^na1,
Fanlin Meng²^na1,
Wen Wang^1,3,4,
Zheng Wang¹,
Chuanbao Zhang¹ &
…
Tao Jiang^1,3,5,6

Scientific Data volume 4, Article number: 170024 (2017) Cite this article

8955 Accesses
168 Citations
3 Altmetric
Metrics details

Subjects

Abstract

Gliomas are the most common and lethal intracranial tumours. RNA sequencing technologies and advanced data analyses recently enabled the characterization of transcriptomic information, including protein-coding gene expression, non-coding gene expression, alternative splicing, and fusion gene detection, to facilitate detection of diseases and altered phenotypes. As a part of the Chinese Glioma Genome Atlas (CGGA) project, our aim was to delineate comprehensive transcriptome profiling in the malignant progression of human gliomas. Three hundred twenty five gliomas with different grades were collected over the past twelve years. Using the Illumina HiSeq 2,000 system, over 92 million high quality 101-bp paired-end reads were generated per sample, yielding a total of 30 billion reads. This comprehensive dataset will be useful to deepen the comprehensive understanding of gliomas, providing an opportunity to generate new therapies, diagnoses, and preventive strategies.

Design Type(s)	transcription profiling by high throughput sequencing design • parallel group design • disease state design
Measurement Type(s)	transcription profiling assay
Technology Type(s)	RNA-seq assay
Factor Type(s)	disease stage
Sample Characteristic(s)	Homo sapiens • glioma cell

Machine-accessible metadata file describing the reported data (ISA-Tab format)

WES data from 286 diffuse gliomas under the 2021 WHO Classification of Tumors of the Central Nervous System

Article Open access 11 November 2022

Zheng Zhao, Ke-Nan Zhang, … Tao Jiang

Variability in estimated gene expression among commonly used RNA-seq pipelines

Article Open access 17 February 2020

Sonali Arora, Siobhan S. Pattwell, … Hamid Bolouri

Transcriptome profiling-based identification of prognostic subtypes and multi-omics signatures of glioblastoma

Article Open access 22 July 2019

Junseong Park, Jin-Kyoung Shim, … Seok-Gu Kang

Background & Summary

Malignant gliomas are the most common and lethal primary brain tumours. According to the multicentre cross-sectional study on brain tumours (MCSBT) in China, age standardized prevalence of primary brain tumours is approximately 22.52 per million for all populations, and that of gliomas is 31.1% in those aged 20–59 years¹. Although there have been advances in the current standard of care, the survival for patients with gliomas has still remained unchanged, especially for aggressive gliomas^2,3. High-throughput whole-transcriptome sequencing technologies, i.e., RNA-seq, have the capability to capture the transcriptomic landscape with an unprecedented high-resolution. RNA-seq technology is able to generate comprehensive transcriptomic information at different levels, including quantification of protein-coding and non-coding gene expression, identification of non-coding RNAs (ncRNAs) and/or fusion genes, and determination of alternative splicing^4,5. Using RNA-seq to catalog abnormal events in the malignant progression of gliomas, from low-grade to high-grade dynamic range, can provide insights into disease susceptibility and identification of biomarkers, and could improve the translation of basic medicine to clinical therapy.

As a part of the Chinese Glioma Genome Atlas (CGGA, http://cgga.org.cn/) project, an RNA-seq profiling dataset for Chinese glioma patient cohort was generated to identify oncogenic fusions associated with glioma progression as previously described⁵. We revealed a novel, recurrent PTPRZ1-MET (ZM) fusion transcript in secondary glioblastomas, which was independently validated in other groups⁶. By combining this dataset with paired protein data, we also indicated that enhanced MET oncoprotein expression and phosphorylation were induced by glioma-specific ZM fusions⁷. To the best of our knowledge, this dataset represent the largest public RNA-seq data on Chinese glioma patient cohort. The principal aim of this study was to provide a comprehensive resource to deposit the raw RNA-seq data sets underpinning the advanced studies on dynamic transcriptomic signatures with the malignant progression of human gliomas. To this end, a total of three hundred and twenty five clinical samples were collected from several hospitals in China. These samples were chosen as they represented glioma patients with different grades of disease progression, allowing comparison of low-grade samples to their high-grade counterparts. The histopathological grades for these samples were as follows: 109 samples of grade II, 72 samples of grade III, and 144 samples of grade IV. As one of the deadliest cancers, grade IV glioma, termed glioblastoma (GBM), has a poor median survival time of only 12 to 18 months^2,8–10. Although most GBMs are primary, approximately twenty percent of them are secondary from the progression of previous low-grade gliomas of grade II or III¹¹. Therefore, it is becoming an urgent clinical task to identify early biomarkers for diagnosis and prognosis, as well as to explore the mechanisms underlying the development and progression of gliomas. This data set will therefore be of important and indispensable values for the glioma research community at large.

In this work, the RNA-seq libraries were sequenced using the Illumina HiSeq 2,000 platform. Approximately 30 billion 101-bp paired-end RNA-seq reads were generated, with an average of over 92 million sequence reads per sample. For each sample, we first aligned the raw reads to the genome using MapSplice (v2.1.7)¹² and annotated the reads to RefSeq, resulting in 17,527 genes. Next, we developed a web-based, user-friendly human glioma transcriptomic profiling database (GLIOMASdb, http://cgga.org.cn:9091/gliomasdb/) to deposit and catalog the transcriptomic profiles of the cohort of 325 glioma samples. In a few words, we have provided a comprehensive RNA-seq data resource on the malignant progression of Chinese glioma, which is useful to understand the progression of the disease. Furthermore, the data resource provides an opportunity to identify the biomarkers of early warning, diagnosis and prognosis, which have the potential to be applied to clinical treatments of glioma.

Methods

Clinical specimen collection

A total of 325 glioma samples were obtained from several hospitals in China from 2004 to 2016. All research performed was approved by the Tiantan Hospital Institutional Review Board (IRB) and kept consistent with the principles of the Helsinki Declaration. All the subjects were diagnosed with gliomas by consensus, according to central pathology reviews by independent board-certified neuropathologists and further graded based on the 2007 WHO classification. Written informed consent was obtained from all patients. The specimens were collected under IRB KY2013-017-01 and were frozen in liquid nitrogen within 5 min of resection. The following clinical information of each patient was also collected: age, sex, diagnosis, WHO grade, chemoradiotherapy and epilepsy therapy status, etc. (Table 1).

Table 1 Patient Characteristics.

Full size table

RNA-seq library preparation and sequencing

Prior to library preparation, total RNA was isolated using RNeasy Mini Kit (Qiagen) according to the manufacturer’s instructions. Pestle and QIAshredder (Qiagen) were used to disrupt and homogenize frozen tissue. The RNA intensity was checked using a 2,100 Bioanalyzer (Agilent Technologies) and only high quality samples with an RNA Integrity Number (RIN) value greater than or equal to 6.8 were used to construct the sequencing library. Typically, 1 μg of total RNA was used with the TruSeq RNA library preparation kit (Illumina) in accordance with low-throughput protocol, except that SuperScript III reverse transcriptase (Invitrogen) was used to synthesize first strand cDNA. After PCR enrichment and purification of adapter-ligated fragments, the concentration of DNA with adapters was determined with quantitative PCR (Applied Biosystems 7,500) using primers QP1 5′-AATGATACGGCGACCACCGA-3′ and QP2 5′-CAAGCAGAAGACGGCATACGAGA-3′. The length of the DNA fragment was measured using a 2,100 Bioanalyzer, with median insert sizes of 200 nucleotides. Then, RNA-seq libraries were sequenced using the Illumina HiSeq 2,000 Sequencing System. The libraries were prepared using the paired-end strategy with the read length of 101 bp. Base-calling was performed by the Illumina CASAVA v1.8.2 pipeline. The quality of the RNA-seq libraries was first evaluated using the FastQC v0.11.5 software (http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc/) (Fig. 1). Then, the reads were subjected to standard quality control (QC) criteria according to the following parameters: (1) reads that aligned to primers and/or adaptors with no more than two mismatches, (2) reads with over 50% of low-quality bases (quality value ≤5) in one read, and (3) reads with over 10% unknown bases (N bases). After QC, over 30 billion filtered reads were used for further analysis.

**Figure 1: A representative example of quality control metrics of RNA sequenced reads as indicated by FastQC (sample: CGGA_909).**

RNA-seq data analysis

RNA-seq mapping and quantification were processed according to the pipeline in our previous research⁷. RNA-seq data from 325 grade II-IV gliomas was obtained from the Chinese Glioma Genome Atlas (CGGA) project. Applying the same TCGA mRNA-seq pipeline (University of North California RNA-seq workflow), expression analysis was performed with modifications. First, clean reads were aligned to the human genome reference (hg19) with MapSplice (v2.1.7)¹², followed by BAM file sorting with SAMtools (v0.1.9)¹³. Second, sequencing read counts for each RefSeq gene were calculated using RSEM (v1.2.15)¹⁴. Third, expression levels from different samples were normalized by the Trimmed Mean of M values (TMM) method¹⁵. Finally, the normalized expression levels of different samples were merged into a FPKM (fragments per kilobase transcriptome per million fragments) matrix for further analysis.

Differential expression analyses were performed to reveal critical genes associated with the malignant progression of gliomas by comparing gene expression in pairwise grades of gliomas. A gene was defined to be associated with the malignant progression of gliomas only if it presented differential expression in the comparison of any two grades. It was required for the genes of differential expression to satisfy both P-value and fold-change criteria simultaneously (|log2(Fold change)|≥1 and FDR<0.05). The P-values were obtained using the unpaired Student’s t-test and the false discovery rate (FDR) was controlled by the Benjamini and Hochberg algorithm.

A database for human gliomas on RNA-seq

To facilitate community-wide use of this unique RNA-seq data on the malignant progression of gliomas, we have developed a free, web accessible, user-friendly transcriptomic profiling database (GLIOMASdb, http://cgga.org.cn:9091/gliomasdb/) for Chinese gliomas. Here, users can easily explore the expression patterns of gene of interest (e.g., mRNA or lncRNA) in the malignant progression of gliomas using the BROWSE interface. In addition, GLIOMASdb currently provides data download including gene expression profiles and matched phenotypic data.

Code availability

The code is available on the Figshare (Data Citation 1).

Data Records

The raw fastq files for the RNA-seq data from the 325 gliomas in different stages of malignant progression, as part of the CGGA data sets, have been deposited in the SRA website (NCBI-SRA) under accession numbers SRP027383 (Data Citation 2) and SRP091303 (Data Citation 3). In addition, the complete datasets, including expression profiles (Data Citation 1) and matched clinical information (Data Citation 1), are available on the Figshare or downloaded from the GLIOMASdb web site: http://cgga.org.cn:9091/gliomasdb/.

Technical Validation

Quality control—RNA integrity

To determine the quality of the RNA-seq libraries, RNA intensity was checked using a 2,100 Bioanalyzer (Agilent Technologies). The integrity of the total RNA was calculated by the RNA Integrity Number (RIN) algorithm, which can be used to estimate RNA quality and degradation level. Higher RIN values indicate better integrity of total RNA, with the highest RIN value being 10. In this study, only total RNAs with a RIN value of greater than or equal to 6.8 were used to construct the sequencing library, which shows RNA with high integrity was used.

Quality validation and analyses

We applied FastQC v0.11.5 software to determine data quality and analysed several measures, including the sequence quality per base, GC content per sequence, sequence duplication levels, and quality score distribution over all sequences of the fastq files. A representative summary plot is depicted (CGGA_909). Here, the quality scores per base were high, with a median quality score above 30 suggesting high quality sequences across all bases (Fig. 1a). The GC distribution per base over all sequences was examined. The GC composition pattern was similar to theoretical distribution indicating that the samples were contaminant free (Fig. 1b). In addition, sequence complexity was examined. Approximately 15% of sequences were shown over 10 to 100 times, either from highly expressed transcripts duplications (Fig. 1c). The quality score distribution over all sequences was analysed to see if a subset of sequences had universally poor quality. The average quality for most sequences was high, with scores above 37, which indicated that a significant proportion of the sequences in a run had overall high quality (Fig. 1d). All other fastqc files were shown to have similar quality metrics compared to sample CGGA_909.

The aim of this study was to depict the comprehensive RNA-Seq transcriptomic profiling of Chinese gliomas. For the analyses we chose samples from grade II-IV gliomas based on central pathology reviews by independent board-certified neuropathologists and the 2007 WHO classification. All are human gliomas, differing in molecular pathology, primarily in terms of malignant progression and possible genetic background. Using a heatmap depicting differentially expressed genes using one-dimensional hierarchical clustering analysis, one can intuitively observe that samples along the malignant progression gradient showed different expression patterns (Fig. 2). Previously, Cheng et al.¹⁶ showed eight immune-related genes (FOXO3, IL6, IL10, ZBTB16, CCL18, AIMP1, FCGR2B, and MMP9) with the greatest prognostic value in grade IV gliomas. Most of which (6/7, no data for FCGR2B in this study) we confirmed were differentially expressed between different grades of gliomas.

**Figure 2: Heatmap and one-dimensional hierarchical clustering of differentially expressed genes (DEGs) across different grades of gliomas.**

Usage Notes

A wide range of public software, e.g., MapSplice¹², TopHat¹⁷, RSEM¹⁴, Cufflinks¹⁸, could be used for transcript assembly and quantification. Other public software combinations could be efficiently used for quantification: e.g., HISATs^19,20 and StringTie²¹ for quantification, used together with the ballgown package^19,22 for differential expression analyses. Currently, several gene annotations are available, such as RefSeq, UCSC, and Ensembl. With the rapid development of RNA-seq technology, we also recommend use of the more comprehensive gene annotation generated from the GENCODE²³ project, that reveals many new classes of RNAs, including pseudogenes and different types of ncRNAs.

A major advantage of this study is that we aimed to comprehensively profile the RNA-seq transcriptome along the malignant progression of gliomas. Consistent with previous research, we identified a set of differentially expressed genes involved in gliomas that may be biomarkers and therapeutic targets. However, there are several existing classifications for human gliomas, including Kernohan, AANS and excellent WHO grade classification. Although these classifications are able to simply recognize the malignant degree of gliomas, they cannot accurately determine the prognosis of patients and are not suitable for the current clinical treatments. We expect that other classification criterion will be developed based on the molecular pathology for understanding gliomas and precise medicines.

Additional Information

How to cite this article: Zhao, Z. et al. Comprehensive RNA-seq transcriptomic profiling in the malignant progression of gliomas. Sci. Data 4:170024 doi: 10.1038/sdata.2017.24 (2017).

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

Jiang, T. et al. Prevalence estimates for primary brain tumors in China: a multi-center cross-sectional study. Chinese medical journal 124, 2578–2583 (2011).
PubMed Google Scholar
Stupp, R. et al. Radiotherapy plus concomitant and adjuvant temozolomide for glioblastoma. The New England journal of medicine 352, 987–996 (2005).
Article CAS Google Scholar
Van Meir, E. G. et al. Exciting new advances in neuro-oncology: the avenue to a cure for malignant glioma. CA: a cancer journal for clinicians 60, 166–193 (2010).
Google Scholar
Cabili, M. N. et al. Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses. Genes & development 25, 1915–1927 (2011).
Article CAS Google Scholar
Bao, Z. S. et al. RNA-seq of 272 gliomas revealed a novel, recurrent PTPRZ1-MET fusion transcript in secondary glioblastomas. Genome research 24, 1765–1773 (2014).
Article CAS Google Scholar
International Cancer Genome Consortium PedBrain Tumor, P. Recurrent MET fusion genes represent a drug target in pediatric glioblastoma. Nature medicine 22, 1314–1320 (2016).
Article Google Scholar
Chen, H. M. et al. Enhanced expression and phosphorylation of the MET oncoprotein by glioma-specific PTPRZ1-MET fusions. FEBS letters 589, 1437–1443 (2015).
Article CAS Google Scholar
Stupp, R. et al. Effects of radiotherapy with concomitant and adjuvant temozolomide versus radiotherapy alone on survival in glioblastoma in a randomised phase III study: 5-year analysis of the EORTC-NCIC trial. The Lancet. Oncology 10, 459–466 (2009).
Article CAS Google Scholar
Ostrom, Q. T. et al. CBTRUS Statistical Report: Primary Brain and Central Nervous System Tumors Diagnosed in the United States in 2008-2012. Neuro-oncology 17(Suppl 4): iv1–iv62 (2015).
Article Google Scholar
Jiang, T. et al. CGCG clinical practice guidelines for the management of adult diffuse gliomas. Cancer letters 375, 263–273 (2016).
Article CAS Google Scholar
Louis, D. N. et al. The 2007 WHO classification of tumours of the central nervous system. Acta neuropathologica 114, 97–109 (2007).
Article Google Scholar
Wang, K. et al. MapSplice: accurate mapping of RNA-seq reads for splice junction discovery. Nucleic acids research 38, e178 (2010).
Article Google Scholar
Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics (Oxford, England) 25, 2078–2079 (2009).
Article Google Scholar
Li, B. & Dewey, C. N. RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC bioinformatics 12, 323 (2011).
Article CAS Google Scholar
Robinson, M. D. & Oshlack, A. A scaling normalization method for differential expression analysis of RNA-seq data. Genome biology 11, R25 (2010).
Article Google Scholar
Cheng, W. et al. Bioinformatic profiling identifies an immune-related risk signature for glioblastoma. Neurology 86, 2226–2234 (2016).
Article CAS Google Scholar
Kim, D. et al. TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome biology 14, R36 (2013).
Article Google Scholar
Trapnell, C. et al. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nature biotechnology 28, 511–515 (2010).
Article CAS Google Scholar
Pertea, M., Kim, D., Pertea, G. M., Leek, J. T. & Salzberg, S. L. Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown. Nature protocols 11, 1650–1667 (2016).
Article CAS Google Scholar
Kim, D., Langmead, B. & Salzberg, S. L. HISAT: a fast spliced aligner with low memory requirements. Nature methods 12, 357–360 (2015).
Article CAS Google Scholar
Pertea, M. et al. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nature biotechnology 33, 290–295 (2015).
Article CAS Google Scholar
Frazee, A. C. et al. Ballgown bridges the gap between transcriptome assembly and expression analysis. Nature biotechnology 33, 243–246 (2015).
Article CAS Google Scholar
Harrow, J. et al. GENCODE: the reference human genome annotation for The ENCODE Project. Genome research 22, 1760–1774 (2012).
Article CAS Google Scholar

Data Citations

Beijing Neurosurgical Institute Figshare https://dx.doi.org/10.6084/m9.figshare.c.3594560 (2016)
Ye, W. NCBI Sequence Read Archive SRP027383 (2014)
Beijing Neurosurgical Institute NCBI Sequence Read Archive SRP091303 (2016)

Download references

Acknowledgements

We gratefully acknowledge the help and support from many collaborators for their help in RNA-seq experiments. This work was supported by The National Key Research and Development Plan (No. 2016YFC0902500).

Author information

Zheng Zhao and Fanlin Meng: These authors contributed equally to this work.

Authors and Affiliations

Beijing Neurosurgical Institute, Beijing, 100050, China
Zheng Zhao, Wen Wang, Zheng Wang, Chuanbao Zhang & Tao Jiang
School of Medicine, Tsinghua University, Beijing, 100084, China
Fanlin Meng
Department of Neurosurgery, Beijing Tiantan Hospital, Capital Medical University, Beijing, 100050, China
Wen Wang & Tao Jiang
Department of Neurosurgery, The Second Affiliated Hospital of Soochow University, Suzhou, 215123, China
Wen Wang
Centre of Brain Tumour, Beijing Institute for Brain Disorders, Beijing, 100069, China
Tao Jiang
China National Clinical Research Centre for Neurological Diseases, Beijing, 100050, China
Tao Jiang

Authors

Zheng Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Fanlin Meng
View author publications
You can also search for this author in PubMed Google Scholar
Wen Wang
View author publications
You can also search for this author in PubMed Google Scholar
Zheng Wang
View author publications
You can also search for this author in PubMed Google Scholar
Chuanbao Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Tao Jiang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

T.J., and Z.Z. conceived the research. Z.Z. and F.M. designed the research. Z.Z., F.M., Z.W., and C.Z. performed data analysis. T.J., Z.Z. and F.M. wrote the manuscript; all authors reviewed and approved the manuscript.

Corresponding author

Correspondence to Tao Jiang.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

ISA-Tab metadata

Rights and permissions

This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0 Metadata associated with this Data Descriptor is available at http://www.nature.com/sdata/ and is released under the CC0 waiver to maximize reuse.

Reprints and permissions

About this article

Cite this article

Zhao, Z., Meng, F., Wang, W. et al. Comprehensive RNA-seq transcriptomic profiling in the malignant progression of gliomas. Sci Data 4, 170024 (2017). https://doi.org/10.1038/sdata.2017.24

Download citation

Received: 19 October 2016
Accepted: 18 January 2017
Published: 14 March 2017
DOI: https://doi.org/10.1038/sdata.2017.24

This article is cited by

Ibrutinib disrupts blood-tumor barrier integrity and prolongs survival in rodent glioma model
- Sanghee Lim
- Minhye Kwak
- Sadhana Jackson
Acta Neuropathologica Communications (2024)
Association Between Diverse Cell Death Patterns Related Gene Signature and Prognosis, Drug Sensitivity, and Immune Microenvironment in Glioblastoma
- Jian Li
- Zhaoming Song
- Zhong Wang
Journal of Molecular Neuroscience (2024)
Identification and validation of a novel prognostic signature based on mitochondria and oxidative stress related genes for glioblastoma
- Shiao Tong
- Minqi Xia
- Daofeng Tian
Journal of Translational Medicine (2023)
Comprehensive analysis of the clinical and biological significances of cholesterol metabolism in lower-grade gliomas
- Rui Tao
- Ruoyu Huang
- Kuanyu Wang
BMC Cancer (2023)
Stalled oligodendrocyte differentiation in IDH-mutant gliomas
- Yanfei Wei
- Guanzhang Li
- Xiaolong Fan
Genome Medicine (2023)

Subjects

Abstract

Similar content being viewed by others

Background & Summary

Methods

Clinical specimen collection

RNA-seq library preparation and sequencing

RNA-seq data analysis

A database for human gliomas on RNA-seq

Code availability

Data Records

Technical Validation

Quality control—RNA integrity

Quality validation and analyses

Usage Notes

Additional Information

References

References

Data Citations

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

ISA-Tab metadata

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links