Table 1.

Average classification performance of the quality features in each subset group.

Quality featureGroup AGroup BGroup C
RAW_Basic_Statistics0.5000.5000.500
RAW_Per_base_sequence_quality0.6810.7380.747
RAW_Per_tile_sequence_quality0.6450.6680.658
RAW_Per_sequence_quality_scores0.6930.7380.754
RAW_Per_base_sequence_content0.6500.6220.632
RAW_Per_sequence_GC_content0.6840.6860.652
RAW_Per_base_N_content0.6740.7320.745
RAW_Sequence_Length_Distribution0.5020.5230.511
RAW_Sequence_Duplication_Levels0.6060.5860.602
RAW_Overrepresented_sequences0.7710.7630.740
RAW_Adapter_Content0.5510.5260.538
RAW_Kmer_Content0.5590.5740.549
MAP_SE_no_mapping0.8400.8170.841
MAP_SE_uniquely0.8290.8210.850
MAP_SE_multiple0.8600.8050.859
MAP_SE_overall0.8400.8170.841
MAP_MI_no_mapping0.8370.8090.825
MAP_MI_uniquely0.8390.8170.841
MAP_MI_multiple0.8470.7980.841
MAP_MI_overall0.8460.8100.827
LOC_Promoter0.7190.7170.717
LOC_5_UTR0.7060.6920.687
LOC_3_UTR0.7450.7110.725
LOC_1st_Exon0.6990.6880.702
LOC_Other_Exon0.7330.7240.733
LOC_1st_Intron0.6870.7160.705
LOC_Other_Intron0.6860.7170.697
LOC_Downstream0.6810.6880.697
LOC_Distal_Intergenic0.7270.7190.710
TSS_−45000.6900.6920.676
TSS_−35000.6960.7070.710
TSS_−25000.6820.6940.699
TSS_−15000.7020.6850.708
TSS_−5000.7030.7310.723
TSS_+5000.7060.7180.718
TSS_+15000.6860.7020.724
TSS_+25000.6910.7090.722
TSS_+35000.6770.7030.712
TSS_+45000.6910.6920.701
MAP_PE_con_no_mapping0.8330.7110.711
MAP_PE_con_uniquely0.8520.7720.772
MAP_PE_con_multiple0.8300.7080.708
MAP_PE_dis_uniquely0.7710.6760.676
MAP_PE_cod_no_mapping0.8370.6260.626
MAP_PE_cod_uniquely0.7730.6730.673
MAP_PE_cod_multiple0.8490.6550.655
MAP_PE_overall0.8530.7240.724
  • Classification performance is measured as area under Receiver Operating Characteristic curve (auROC) from 0.5 for a random classification and 1.0 for a perfect classification. MAP features perform best over all three groups. The RAW features that perform well do so over all three groups. For LOC and TSS only some of the features show good performance on average; however, these still can be more important for some of the subsets in each group.