To the Editor:

Next-generation sequencing technologies generate vast catalogs of short RNA sequences from which to mine microRNAs (miRNAs), which are 21–24-nucleotide regulatory RNAs derived from RNase III–mediated cleavages of hairpin transcripts. However, such data must be vetted to appropriately categorize miRNA precursors and interpret their evolution. A recent study annotated hundreds of miRNAs in three Drosophila species on the basis of singleton reads of heterogeneous length1. Our multimillion-read datasets indicated that most of these putative miRNAs were not produced by RNase III cleavage and that they comprised many mRNA degradation fragments. We instead identified a distinct and smaller set of new miRNAs supported by high-confidence cloning signatures, which included a high proportion of evolutionarily nascent mirtrons. Our data support a much lower rate for the emergence of lineage-specific miRNAs than was previously inferred1, with a net flux of 1 miRNA per million years of drosophilid evolution.

Conserved miRNA genes are differentiated from bulk hairpins in that their terminal loops diverge more quickly than their stems2. However, species-specific miRNAs cannot be confidently identified by using solely computational methods, as hundreds of thousands of Drosophila1,3,4,5 and human loci6 are plausible as miRNA hairpins. Instead, we and others have turned to next-generation sequencing to identify recently evolved miRNAs, which lack support from evolutionary signatures (for example, Supplementary Table 1). Such deep sequence data often reveal heterogeneous size and read patterns with respect to predicted hairpins (Fig. 1 and Supplementary Fig. 1 ), indicating that only a subset of hairpins with reads are substrates of Dicer-driven biogenesis pathways. In particular, it is not possible to determine whether a predicted hairpin associated with a single-cloned short RNA is indeed an endogenous substrate of RNase III cleavage (Fig. 1).

Figure 1: Putative miRNA loci annotated on the basis of single reads and plausible hairpin structures (center box) show distinct patterns when more reads are available.
figure 1

Reads may be distributed throughout the inferred hairpin, have heterogeneous sizes and/or pair as duplexes lacking 3′ overhangs (top left and right); reads with any of these characteristics cannot be annotated as miRNAs. High-confidence miRNAs have multiple cloned 21–24 nucleotide reads with relatively fixed 5′ ends (bottom left). With sufficient sequencing, it is usually possible to identify the duplex partner miRNA* species, as well as other byproducts of miRNA biogenesis such as terminal loops or species flanking the pre-miRNA hairpin (bottom right).

Lu and colleagues reported 900 putative novel miRNAs sequenced from three Drosophila species—D. melanogaster (Dme), D. simulans (Dsi) and D. pseudoobscura (Dps)—including 400 annotated under 'high-stringency' criteria1. They concluded that evolutionarily transient miRNA genes are continually born and lost, with only a small proportion of miRNAs fixed across drosophilid radiation. Inspection of these annotations showed that 35 Dme, 47 Dsi and 30 Dps 'novel' miRNAs corresponded to orthologs of 50 distinct genes whose cloning and evolutionary characteristics had been previously described4,5,7 (miRBase 10.1 and Supplementary Tables 2-4). Another locus comprising multiple tandem hairpins corresponded to hairpin RNA hp-CG4068, which generates endogenous small interfering RNAs (endo-siRNAs)8. We sought to understand the nature of the remaining hundreds of miRNA candidates, whose abundant numbers were previously used to estimate a birthrate of 12 miRNAs per Myr of drosophilid evolution1.

We mapped 15 million Dme reads from diverse developmental stages and tissues, including 1 million from adult heads4,9. Compared to their frequency among 16,000 reads from adult Dme heads1, we expected our data to contain 60-fold more reads for genuine miRNAs and likely more, given that many are expressed in multiple stages and tissues. This was true for the 35 Dme miRBase 10.1 loci designated 'novel' by Lu and colleagues1. These 'novel' loci were represented by 1,247 reads in their data (34 reads per locus, although 6 loci were cloned only 2–3 times and 12 were singletons) but by 320,000 reads in our data (8,800 reads per locus). The remaining 23 non-miRBase loci were severely under-represented in our data, with 9 cloned 1–6 times and 9 that were not recovered at all (Supplementary Table 2).

For non-miRBase loci cloned in our dataset, the reads mapped incoherently across the predicted hairpin and/or adjacent genomic regions (Fig. 1 and Supplementary Fig. 1). They also showed broadly heterogeneous sizes, contrasting with the restricted lengths of genuine Drosophila miRNAs (Fig. 2). Although some loci were conserved, the most abundant reads mapped to a ribosomal RNA (rRNA; Lu-mir-2018) and two small nuclear RNAs (snoRNAs; Lu-mir-2324 and Lu-mir-2213); 16 out of the 20 remaining loci derived from mRNAs (Supplementary Table 2). Therefore, instances of conservation were attributable to protein-coding or functional RNA status and not to evolutionary dynamics characteristic of genuine miRNAs (Supplementary Fig. 1a,b). Similar analysis revealed that hundreds of new Dsi and Dps miRNA candidates1 mapped to syntenic exons of Dme protein-coding transcripts (Supplementary Tables 3-6), with reads spanning the 18–28-nucleotide window used for cloning (Fig. 2). We conclude that the prior miRNA annotations1 included a high proportion of RNA fragments derived from the degradation of diverse mRNAs and some noncoding RNAs (ncRNAs).

Figure 2: Size comparison of miRBase Dme, Dsi and Dps miRNAs and other miRNA candidates annotated by Lu and colleagues1.
figure 2

(a–f) We used Solexa data from diverse Dme samples and Dsi or Dps embryos to assess the distribution of read sizes from annotated loci that were orthologous to miRBase 10.1 genes (a–c) or lacked miRBase orthologs (d–f). The top panels indicate that genuine Drosophila miRNAs produce a characteristic range of 21–24-nucleotide RNAs, with preference for 22 nucleotides (dashed reference lines). The other candidate miRNAs, nearly all of which were annotated on the basis of single reads1, showed broadly heterogeneous sizes in our larger datasets; note that we did not recover any reads for many of these loci.

We therefore wished to gauge miRNA flux using independent small-RNA data. We and others annotated 147 miRNA loci (including 14 mirtrons) from 1 million Dme reads4,5,7, but >17 million additional reads9,10 yielded only 14 new miRNA loci and the high-confidence antisense locus Dme-mir-307-as (Supplementary Tables 7 and 8). Because of this sequencing depth, we could assign confident miRNA cloning patterns to novel loci, and most had star reads despite their evolutionary transience (Supplementary Figs. 1c and 2). Curiously, 5 out of 14 were mirtrons, a high proportion consistent with the hypothesis that mirtrons generally evolve more quickly than canonical miRNAs11,12. Four miRBase loci that did not meet confident read criteria are discussed in the Supplementary Text and Supplementary Figure 3.

We next mapped 3,712,683 and 3,318,524 small RNAs from mixed embryos of Dsi and Dps, respectively, and 3,442,645 reads from adult Dps heads (Supplementary Table 1). These data comprise 50–270 times the data earlier used to estimate miRNA diversity1 and provided an appropriate basis for annotating the miRNAs in these other species without needing to consider their evolutionary features. Our datasets contained abundant reads for previously rare or uncloned Dsi and Dps orthologs of miRBase 10.1 loci (Supplementary Tables 9–12), consistent with the expectation that genuine miRNAs are recovered proportionally to sequencing depth. These reads yielded 11 new Dsi miRNAs, including 5 mirtrons (2 of which were orthologous to novel Dme mirtrons mir-2489 and mir-2494) and >88 distinct novel Dps miRNAs, including 17 mirtrons (Supplementary Tables 9–12 and Supplementary Figs. 4 and 5; see also Supplementary Text for discussion of potentially duplicate Dps loci). Among these, the overlap with the annotations of Lu and colleagues was minimal: only 4 out of 261 Dsi loci and 19 out of 598 Dps loci1 overlapped between their annotations and ours. Conversely, nearly 300 of their reported Dsi and Dps miRNAs had 0 reads in our data, and 100 had fewer than 5 reads (Supplementary Tables 3 and 4). Therefore, deep sequencing failed to validate most of the previously reported miRNAs1, and the minimal overlap in annotated loci highlights that the differences were not due to the application of more 'conservative' compared to more 'lenient' cutoffs to a common set of hairpins.

Although the rates of miRNA flux amongst different species of Drosophila might be expected to be reasonably similar, Lu and colleagues annotated vastly different numbers of species-specific miRNAs in Dme, Dsi and Dps1. This does not seem likely to be a consequence of their different sampling depths in these species, as all of their datasets were smaller by a factor of 100 than those analyzed in the present study. Our annotations from multimillion-read datasets instead yielded numbers of new genes that were consistent with the relative ancestries of these species. We recovered few new miRNAs in the highly related Dme and Dsi sister species but many more in the distant Dps species (Fig. 3); most newly identified Dps genes were conserved only in its related sister D. persimilis (Dper). The overall flux in the miRNA repertoire was consistent: 45–47 miRNAs cloned from Dme or Dsi have no obscura-group homologs, whereas 88 miRNAs were cloned from Dps for which no melanogaster-group homologs exist. Assuming 55 Myr of divergence between these clades as before1, this puts the rate of drosophilid miRNA flux at 0.82–1.6 genes per Myr, far less than the 12 genes per Myr earlier proposed1. Notably, the tally of species-restricted mirtrons relative to canonical miRNAs was disproportionately high in all three species (Fig. 3 and Supplementary Figs. 2, 4 and 5). Therefore, mirtrons and canonical miRNAs show distinct evolutionary dynamics for emergence and fixation, even though they generate functionally identical regulatory RNAs.

Figure 3: Flux of drosophilid miRNA genes assessed using multimillion-read datasets in three species.
figure 3

Small RNAs were cloned from the species in dark green; detailed orthology of novel miRNAs annotated in this study was determined with respect to species, shown in light green. Because not all loci are necessarily present in all of the species in a given branch, some values are designated as approximate. For example, the Dps and Dper genomes coordinately lack orthologs of nine miRNA genes present in the sophophoran and/or proto-drosophilid ancestor (Supplementary Fig. 7); these orthologs are considered to have died in the obscura lineage. Among the dozen Dme- or Dsi-cloned miRNAs for which aligning sequences were found only in their closest sister species, only a few have cloned small-RNA evidence from multiple species thus far (for example, the highly species-restricted miR-2489 was cloned from both Dme and Dsi). We do not exclude that some of these miRNAs may actually prove to be unique to a single species. Note that mirtrons comprise a small fraction of the deeply conserved set of miRNAs, but they comprise a much higher fraction of lineage-restricted miRNAs in various drosophilid genomes.

The net rate of miRNA flux is a combination of genes born and genes lost, but distinguishing birth from death is challenging. For example, the 70 miRNAs shared by Dps and Dper for which no orthologs exist in any melanogaster-group genomes might have been 'born' in the ancestor to the obscura lineage or 'died' in the ancestor to the melanogaster lineage (Fig. 3). In addition, the poorer state of the Dsi genome assembly obfuscates whether it truly lost some genes (nine pan-drosophilid miRNAs have gaps or errors in DroSim1, Supplementary Fig. 6). However, we could confidently judge that nine miRNAs distributed in four operons died in the obscura group, as they were ancestrally conserved but absent from both Dps and Dper (Fig. 3 and Supplementary Fig. 7). Conversely, the small number of Dme, Dsi and Dps miRNAs lacking aligned sequences in any other sequenced species are good candidates for 'newly born' miRNAs. Their identification supports the concept that substrates occasionally arise de novo from neutral evolution of transcripts with hairpin character1.

Nascent miRNAs might have cleavage registers that are more imprecise than those for well-conserved miRNAs, but the biogenesis of miRNAs via RNAse III enzymes indicates that duplexes of appropriate size should be cloned with sufficient sequencing, as observed in our data (Figs. 1 and 2 and Supplementary Figs. 1–5). Similar to what was done in previous analyses1, we assigned singleton reads to hundreds of candidate hairpins (see URL section), and these loci evolved neutrally with respect to hairpin character. However, as few of these loci are likely to be bona fide substrates for Dicer-driven miRNA biogenesis (Fig. 1), their evolution is not generally germane to the evolution of genuine miRNAs.

In principle, there may exist hairpin loci that mostly generate short species via generic RNA catabolism, but for which a fraction of reads derive from RNase III cleavages. The evolutionary dynamics of this population should prove relevant for understanding the birth of miRNA genes. However, experimental evidence beyond deep sequencing is necessary to unequivocally demonstrate their processing by Drosha and Dicer. Because the majority of animal euchromatin is actively transcribed13,14, deep sequencing is expected to recover small RNAs constituting degradation fragments from many incidental hairpins. This is the case even when using protocols that select for 5′ phosphates (and presumably against degradation fragments) because endogenous kinases can phosphorylate arbitrary short RNAs15. The existence of exceptionally diverse populations of Piwi-interacting RNAs (piRNAs) and endo-siRNAs6 further highlights the fact that non-miRNA reads can be abundant in total RNA libraries. In conclusion, confident annotation of miRNAs from deep sequence yields unified rates of canonical miRNA and mirtron evolution among the drosophilids and provides evidence for only a limited set of species-specific miRNAs in this genus.

Accession numbers.

The three small-RNA datasets from Dsi embryos and Dps embryos and heads were submitted to NCBI GEO under series GSE13677.

URLs.

Additional supplementary supporting material is available at http://www.internagenomics.com/public/dros0811.