Average classification performance of the quality features in each subset group.
Quality feature | Group A | Group B | Group C |
---|---|---|---|
RAW_Basic_Statistics | 0.500 | 0.500 | 0.500 |
RAW_Per_base_sequence_quality | 0.681 | 0.738 | 0.747 |
RAW_Per_tile_sequence_quality | 0.645 | 0.668 | 0.658 |
RAW_Per_sequence_quality_scores | 0.693 | 0.738 | 0.754 |
RAW_Per_base_sequence_content | 0.650 | 0.622 | 0.632 |
RAW_Per_sequence_GC_content | 0.684 | 0.686 | 0.652 |
RAW_Per_base_N_content | 0.674 | 0.732 | 0.745 |
RAW_Sequence_Length_Distribution | 0.502 | 0.523 | 0.511 |
RAW_Sequence_Duplication_Levels | 0.606 | 0.586 | 0.602 |
RAW_Overrepresented_sequences | 0.771 | 0.763 | 0.740 |
RAW_Adapter_Content | 0.551 | 0.526 | 0.538 |
RAW_Kmer_Content | 0.559 | 0.574 | 0.549 |
MAP_SE_no_mapping | 0.840 | 0.817 | 0.841 |
MAP_SE_uniquely | 0.829 | 0.821 | 0.850 |
MAP_SE_multiple | 0.860 | 0.805 | 0.859 |
MAP_SE_overall | 0.840 | 0.817 | 0.841 |
MAP_MI_no_mapping | 0.837 | 0.809 | 0.825 |
MAP_MI_uniquely | 0.839 | 0.817 | 0.841 |
MAP_MI_multiple | 0.847 | 0.798 | 0.841 |
MAP_MI_overall | 0.846 | 0.810 | 0.827 |
LOC_Promoter | 0.719 | 0.717 | 0.717 |
LOC_5_UTR | 0.706 | 0.692 | 0.687 |
LOC_3_UTR | 0.745 | 0.711 | 0.725 |
LOC_1st_Exon | 0.699 | 0.688 | 0.702 |
LOC_Other_Exon | 0.733 | 0.724 | 0.733 |
LOC_1st_Intron | 0.687 | 0.716 | 0.705 |
LOC_Other_Intron | 0.686 | 0.717 | 0.697 |
LOC_Downstream | 0.681 | 0.688 | 0.697 |
LOC_Distal_Intergenic | 0.727 | 0.719 | 0.710 |
TSS_−4500 | 0.690 | 0.692 | 0.676 |
TSS_−3500 | 0.696 | 0.707 | 0.710 |
TSS_−2500 | 0.682 | 0.694 | 0.699 |
TSS_−1500 | 0.702 | 0.685 | 0.708 |
TSS_−500 | 0.703 | 0.731 | 0.723 |
TSS_+500 | 0.706 | 0.718 | 0.718 |
TSS_+1500 | 0.686 | 0.702 | 0.724 |
TSS_+2500 | 0.691 | 0.709 | 0.722 |
TSS_+3500 | 0.677 | 0.703 | 0.712 |
TSS_+4500 | 0.691 | 0.692 | 0.701 |
MAP_PE_con_no_mapping | 0.833 | 0.711 | 0.711 |
MAP_PE_con_uniquely | 0.852 | 0.772 | 0.772 |
MAP_PE_con_multiple | 0.830 | 0.708 | 0.708 |
MAP_PE_dis_uniquely | 0.771 | 0.676 | 0.676 |
MAP_PE_cod_no_mapping | 0.837 | 0.626 | 0.626 |
MAP_PE_cod_uniquely | 0.773 | 0.673 | 0.673 |
MAP_PE_cod_multiple | 0.849 | 0.655 | 0.655 |
MAP_PE_overall | 0.853 | 0.724 | 0.724 |
Classification performance is measured as area under Receiver Operating Characteristic curve (auROC) from 0.5 for a random classification and 1.0 for a perfect classification. MAP features perform best over all three groups. The RAW features that perform well do so over all three groups. For LOC and TSS only some of the features show good performance on average; however, these still can be more important for some of the subsets in each group.