Email updates

Keep up to date with the latest news and content from Source Code for Biology and Medicine and BioMed Central.

Open Access Brief reports

RMol: a toolset for transforming SD/Molfile structure information into R objects

Martin Grabner1, Kurt Varmuza2 and Matthias Dehmer1*

Author Affiliations

1 Department of Biomedical Sciences and Engineering, Institute for Bioinformatics and Translational Research, University for Health Sciences, Medical Informatics and Technology (UMIT), Eduard Wallnöfer Zentrum 1, Hall in Tyrol, A-6060, Austria

2 Laboratory for Chemometrics, Institute of Chemical Engineering, Vienna University of Technology, A-1060 Vienna, Getreidemarkt 9/166, Austria

For all author emails, please log on.

Source Code for Biology and Medicine 2012, 7:12  doi:10.1186/1751-0473-7-12

Published: 14 November 2012

Abstract

Background

The graph-theoretical analysis of molecular networks has a long tradition in chemoinformatics. As demonstrated frequently, a well designed format to encode chemical structures and structure-related information of organic compounds is the Molfile format. But when it comes to use modern programming languages for statistical data analysis in Bio- and Chemoinformatics, R as one of the most powerful free languages lacks tools to process Molfile data collections and import molecular network data into R.

Results

We design an R object which allows a lossless information mapping of structural information from Molfiles into R objects. This provides the basis to use the RMol object as an anchor for connecting Molfile data collections with R libraries for analyzing graphs. Associated with the RMol objects, a set of R functions completes the toolset to organize, describe and manipulate the converted data sets. Further, we bypass R-typical limits for manipulating large data sets by storing R objects in bz-compressed serialized files instead of employing RData files.

Conclusions

By design, RMol is a R toolset without dependencies to other libraries or programming languages. It is useful to integrate into pipelines for serialized batch analysis by using network data and, therefore, helps to process sdf-data sets in R efficiently. It is freely available under the BSD licence. The script source can be downloaded from http://sourceforge.net/p/rmol-toolset. webcite