Genome coverage, literally speaking. The challenge of annotating 200 genomes with 4 million publications

Paul Janssen, Leonid Goldovsky, Victor Kunin, Nikos Darzentas, Christos Ouzounis, Rafi Benotmane, Paul Borgermans

    Research outputpeer-review


    In late 2004, 200 complete genomes had been sequenced and made available to the research community. At the time of writing this viewpoint, that number had further risen to 221 and will have undoubtedly increased again before publication. These genomes, which represent a wide range of species from archaea to human, are a highly valuable knowledge resource for the scientific community. However, the sequencing of a full genome is just the first step in research; it must be followed by the functional characterization of genes and proteins. In this context, it is interesting to see how well represented these sequenced species are in terms of publications. We have thus obtained the number of abstracts published per species and normalized that count by the number of genes in that species to obtain a comparable measure for the number of publications per gene for all completed and published genomes. This simple measure highlights the current knowledge gap between various organisms and could further serve as a guideline for selecting genomes for sequencing projects, high-throughput functional genomics and database annotation efforts.
    Original languageEnglish
    Pages (from-to)397-399
    JournalEMBO Reports
    Issue number5
    StatePublished - 1 May 2005

    Cite this