Review
The Drosophila genome

https://doi.org/10.1016/S0959-437X(00)00140-4Get rights and content

Abstract

The past year has been a spectacular one for Drosophila research. The sequencing and annotation of the Drosophila melanogaster genome has allowed a comprehensive analysis of the first three eukaryotes to be sequenced—yeast, worm and fly—including an analysis of the fly's influences as a model for the study of human disease. This year has also seen the initiation of a full-length cDNA sequencing project and the first analysis of Drosophila development using high-density DNA microarrays containing several thousand Drosophila genes. For the first time homologous recombination has been demonstrated in flies and targeted gene disruptions may not be far off.

Introduction

‘The Lord in his wisdom made the flyBut forgot to tell us why'Ogden Nash, The Fly.

With the sequencing of the Drosophila genome completed this year in an unprecedented collaboration between the Berkeley Drosophila Genome Project and Celera Genomics, we can start to address the question posed by Ogden Nash. A set of papers in the March 24th, 2000 issue of Science constitutes the first synthesis of the computational analysis of the fly's DNA sequence. Historically, studies of the fly have provided many of the essential components for this synthesis: the chromosome theory of heredity (1915), cytogenetic maps (1938), X-ray induced mutations (1927), the first genomic libraries (1975) and the first genome-wide mutational screens to identify the genes regulating development (1980) (for review, see [1]). The fly genome has been sequenced using the whole-genome ‘shotgun’ method [2, [3 and the assembly verified by comparison to a traditional physical map [4], [5]. This sequence and the annotation of the genome has permitted an initial comparative analysis [6radical dotradical dot]. The implications for biology and medicine are significant, as described both by Kornberg and Krasnow [7] and by Rubin et al. [6radical dotradical dot]. Another report [8] describes the status of the cDNA resources needed both to accurately annotate the genomic sequence and for functional studies. In this review, I highlight these major advances in Drosophila research and, in addition, pay particular attention to a recent advance, homologous recombination, that may make targeted gene disruption widely available in this model organism.

Drosophila is the third eukaryotic genome to be sequenced, following the 12 Mb yeast (Saccharomyces cerevisiae) [9] and the 97 Mb nematode worm (Caenorhabditis elegans) [10]. The Drosophila genome is ∼180 Mb, a third of which is centric heterochromatin. The centric heterochromatin cannot be cloned stably and therefore the sequence (Release 1) is primarily that of the euchromatic portion of the fly genome. The 120 Mb of euchromatin resides on four chromosomes: two large autosomes (second and third), the X chromosome, and a small fourth chromosome containing only ∼1Mb of euchromatin.

Section snippets

Whole-genome shotgun assembly of Drosophila

With much skepticism from the sequencing community, Gene Myers and his colleagues at Celera Genomics [3radical dot] undertook and successfully generated an assembly of the euchromatic portion of the Drosophila genome. Approximately 24,000 sequence reads were generated from bacterial artificial chromosome (BAC) clones (∼163kb) and 3 million sequence reads were generated from 2 and 10kb genomic clones. Paired-end sequence was essential to the correct assembly: 72% of the sequence reads were in the form of

Gene annotation

Computational analysis was used to predict transcript and protein sequence as well as potential functions for each putative protein. Genes were identified using two gene-finding programs, ‘Genscan’ [12] and ‘Genie’ [13], [14], in conjunction with the results of complementary DNA and database searches. The final gene structures were determined by human curation. The initial computational analysis predicts 13,601 genes, just twice the number for the simple single-celled yeast and fewer than

Comparative genomics

With the sequencing of three eukaryotic genomes now complete, Rubin et al. [6radical dotradical dot] have compared their core proteomes. The core proteome is defined as the set of non-redundant proteins produced in each organism. The core proteomes of yeast, worms and flies contain 4383, 9453 and 8065 protein families, respectively. For the comparison of the core proteomes, a protein is defined as an ortholog if it shows similarity for at least 80% of the length of its sequence. Flies share 16% of their genes with

Implications for human disease

It is hard to imagine that this small invertebrate, the fruitfly, can serve as a model for human disease. Yet with the cloning of the Drosophila homeobox genes (for review, see [27]), it became apparent that numerous processes controlling metazoan development are conserved in higher organisms (reviewed in [7]). In an attempt to estimate the extent to which different types of human disease genes are found in flies, Rubin et al. [6radical dotradical dot] identified 289 genes. This set of genes implicated in human

Gene expression and protein function

Integral to interpreting the complete genomic sequence is having a cDNA that represents each gene. These cDNAs will be used for studies of protein function, and their sequence will be used to determine gene structure, including 5′ and 3′ non-coding exons and intron/exon boundaries. Rubin et al. [8] describe a set of cDNAs (‘Drosophila Gene Collection Release 1.0’) corresponding to >40% of the genes in the fly, and strategies for isolating cDNAs representing the remaining genes. This set of

Homologous recombination in Drosophila

One significant limitation to Drosophila melanogaster as a model organism has been the inability to make targeted gene disruptions, that are possible in yeast and mice. The first paper demonstrating a system for homologous recombination in flies was published this summer by Yikang Rong and Kent Golic [41radical dot]. Although they have not yet generated mutants, they have successfully rescued yellow (y) mutant flies by substituting the wild-type allele for the y mutant.

Their system is dependent on three

Conclusions and future directions

Two large questions remain. Embedded in the vast amount of non-coding sequence are the control elements that direct proper spatial and temporal gene expression. How do we identify them? And with the initial identification of the proteome of Drosophila, what are the functions of the proteins and how do they interact with one another? With the complete D. melanogaster sequence and anticipated sequence from another Drosophila species, possibly D. pseudoobscura, comparative analysis can be used to

Acknowledgements

I thank Catherine Nelson, Joanne Topol and Gerry Rubin for critically reading the manuscript. I thank Kent Golic, Bruce Hay, Paul Lasko, Troy Littleton and David Wassarman for providing preprints of their manuscripts. I would also like to thank members of the Berkeley Drosophila Genome Project and Celera Genomics for their united efforts to produce the invaluable Drosophila genome sequence. This work was supported by NIH grant P50HG00750.

References and recommended reading

Papers of particular interest, published within the annual period of review,have been highlighted as:

  • radical dot of special interest

  • radical dotradical dot of outstanding interest

References (45)

  • GM Rubin et al.

    Comparative genomics of the eukaryotes

    Science

    (2000)
  • TB Kornberg et al.

    The Drosophila genome sequence: implications for biology and medicine

    Science

    (2000)
  • GM Rubin et al.

    A Drosophila complementary DNA resource

    Science

    (2000)
  • A Goffeau et al.

    Life with 6000 genes

    Science

    (1996)
  • Consortium TCeS: Genome sequence of the nematode C. elegans: a platform for investigating biology. Science 1998,...
  • M Ashburner

    A biologist's view of the Drosophila genome annotation assessment project

    Genome Res

    (2000)
  • D Kulp et al.

    A generalized hidden Markov model for the recognition of human genes in DNA

    Intelligent Systems Mol Biol

    (1996)
  • MG Reese et al.

    Genie-gene finding in Drosophila melanogaster

    Genome Res

    (2000)
  • M Ashburner et al.

    Gene ontology: tool for the unification of biology

    Nat Genet

    (2000)
  • Sekelsky, Brodsky, Burtis K: DNA repair. J Cell Biol 2000,...
  • S Mount

    Pre-messenger RNA processing factors in the Drosophila genome

    J Cell Biol

    (2000)
  • SP Lasko

    The Drosophila genome: translation factors and RNA binding proteins

    J Cell Biol

    (2000)
  • Cited by (12)

    • Viral variant visualizer (VVV): A novel bioinformatic tool for rapid and simple visualization of viral genetic diversity

      2021, Virus Research
      Citation Excerpt :

      Routine sequencing of nucleic acids molecules started in the late 1970’s, using the Sanger’s chain-termination method (Sanger et al., 1977) in which different length amplicons are generated using target molecule specific primers and modified nucleotides that randomly terminate elongation. The rise of this technique allowed genome sequencing of multiple living species (Moraes and Góes, 2016; Vogel, 2000; Celniker, 2000; Consortium, 1998; Goffeau et al., 1996) but these projects were expensive and time-consuming. Most recently, in virology and other fields, there has been a substantial increase in the use of a newly developed technique known as high throughput sequencing and globally named as Next-Generation Sequencing (NGS).

    • Of flies and men - Studying human disease in Drosophila

      2001, Current Opinion in Genetics and Development
    • Insect Molecular Genetics

      2013, Insect Molecular Genetics
    View all citing articles on Scopus
    View full text