Figure 2.

Schematic representation of the sequence and motif pipelines. Several successive versions of a given source database usually coexist at different stages in a pipeline. The databases are processed by three scripts running simultaneously, in a manner similar to a system daemon: HKLoader watches the source data files for changes (using the date/time stamp). This script is responsible for parsing and converting the raw data, detecting redundancy, and transferring the "clean" data into the SQL database. HKUpdater updates the hit list. Once a motif database enters the prepare state, the new motifs are computed against the sequences that are in current state. Similarly, when a sequence database comes in the states prepare, the new sequences are computed against the motifs that are in the current state. The two computational tasks, sequences-vs-motifs and motifs-vs-sequences, are never executed simultaneously – this keeps the two pipelines synchronized. Once the calculations are done, HKPublisher becomes responsible for the deployment of the databases to external computing elements (e.g. a blast server) and the database flagged as ready is promoted to current ("in production"): all subsequent queries are now applied to this database. Previous versions can be kept as archives or deleted to reclaim space.

Hau et al. Source Code for Biology and Medicine 2007 2:2   doi:10.1186/1751-0473-2-2
Download authors' original image