HitKeeper, a generic software package for hit list management
1 Nestlé Research Center, Department of BioAnalytical Science, PO Box 44, CH-1000 Lausanne 26, Switzerland
2 EPFL Database Laboratory, CH-1015 Lausanne, Switzerland
3 Swiss Institute of Bioinformatics, Vital-IT group, CH-1015 Lausanne, Switzerland
Source Code for Biology and Medicine 2007, 2:2 doi:10.1186/1751-0473-2-2Published: 28 March 2007
The automated annotation of biological sequences (protein, DNA) relies on the computation of hits (predicted features) on the sequences using various algorithms. Public databases of biological sequences provide a wealth of biological "knowledge", for example manually validated annotations (features) that are located on the sequences, but mining the sequence annotations and especially the predicted and curated features requires dedicated tools. Due to the heterogeneity and diversity of the biological information, it is difficult to handle redundancy, frequent updates, taxonomic information and "private" data together with computational algorithms in a common workflow.
We present HitKeeper, a software package that controls the fully automatic handling of multiple biological databases and of hit list calculations on a large scale. The software implements an asynchronous update system that introduces updates and computes hits as soon as new data become available. A query interface enables the user to search sequences by specifying constraints, such as retrieving sequences that contain specific motifs, or a defined arrangement of motifs ("metamotifs"), or filtering based on the taxonomic classification of a sequence.
The software provides a generic and modular framework to handle the redundancy and incremental updates of biological databases, and an original query language. It is published under the terms and conditions of version 2 of the GNU Public License and available at http://hitkeeper.sourceforge.net webcite.