The Common Workflow Language (CWL) is a specification for describing analysis workflows and tools in a way that makes them portable and scalable across a variety of software and hardware environments, from workstations to cluster, cloud, and high performance computing (HPC) environments. CWL is designed to meet the needs of data-intensive science, such as Bioinformatics, Medical Imaging, Astronomy, Physics, and Chemistry. CWL is developed by a multi-vendor working group consisting of organizations and individuals aiming to enable scientists to share data analysis workflows. The CWL project is maintained on Github and we follow the Open-Stand.org principles for collaborative open standards development. Legally, CWL is a member project of Software Freedom Conservancy and is formally managed by the elected CWL leadership team, however every-day project decisions are made by the CWL community which is open for participation by anyone. CWL builds on technologies such as JSON-LD for data modeling and Docker for portable runtime environments.
The Common Workflow Language (CWL) is a community-developed specification for interoperable scientific workflows, supported by multiple workflow engine vendors and open source projects. Started as a third-year project at The University of Manager and further developed as part of the BioExcel project, the CWL Viewer is available to visualize any CWL workflow definitions, show their annotations and composition.
The public CWL Viewer instance has become the de facto standard web visualization tool for workflows within the larger CWL community – the list of known workflows shows more than 2000 individual workflows have been visualized.
In 2017 the CWL Viewer was presented at the ISMB/ECCB conference where it won the F1000 Best Poster Award. The development and hosting of CWL Viewer is now being transitioned to Curii Corporation, an industry partner in the CWL project that is developing the Arvados platform.
A new Graphical User Interface (GUI) on top of these workflow building blocks (and eventually on top of all the building blocks developed in BioExcel) is being developed: biobb web server. This GUI will ease the usage of BioExcel workflows and tools for a big community that is still not familiar with HPC programming, but have a real interest on this topic, which can include pharmaceutical companies (used to work with GUIs) but also entry level users, whose interest is demonstrated in the high number of registered users (~4000) and pipelines run as of today with web server interfaces such as MDWeb. biobb web server will allow users to run a set of chosen, pre-configured workflows built using BioExcel building blocks, such as a structure quality checking, a structure energy minimization, a complete MD setup, or a complete MD simulation (with length restrictions). The GUI will also provide an additional interactivity to our building blocks. A great example is the possibility to run a quality check of a structure, while at the same time a 3D representation of the molecule is shown in the same interface, highlighting the region of the structure of particular interest. This interactivity can be applied also to the set of analyses generated by the workflows.
Workflows will be submitted and treated by a queue manager, which will serve them in an on-demand processing model performed by Virtual Machines automatically deployed in an Open Nebula OneFlow cloud environment. A direct connection to HPC supercomputing infrastructures to submit long molecular dynamics simulations prepared using the portal will be studied.
biobb server will also be the entry point for BioExcel building blocks. The web page will gather all the information on how to obtain, install and run the building blocks and workflows generated by BioExcel: for developers or experts in the field (github, bioconda, biocontainers, VMs/Cloud) for HPC users (environment modules) and for entry level users (Galaxy, KNIME, web server).
CPMD code is a parallelized plane wave/pseudopotential implementation of Density Functional Theory, particularly designed for ab-initio molecular dynamics. CPMD is currently the most HPC efficient code that allows performing quantum molecular dynamics simulations by using the Car-Parrinello molecular dynamics scheme. CPMD simulations are usually restricted to systems of few hundred atoms. In order to extend its domain of applicability to (much) larger biologically relevant systems, a hybrid quantum mechanical/molecular mechanics (QM/MM) interface, employing routines from the GROMOS96 molecular dynamics code, has been developed.
CP2K is a quantum chemistry and solid state physics software package that can perform atomistic simulations of solid state, liquid, molecular, periodic, material, crystal, and biological systems. CP2K provides a general framework for different modeling methods such as DFT using the mixed Gaussian and plane waves approaches GPW and GAPW. Supported theory levels include DFTB, LDA, GGA, MP2, RPA, semi-empirical methods (AM1, PM3, PM6, RM1, MNDO), and classical force fields (AMBER, CHARMM). CP2K can do simulations of molecular dynamics, metadynamics, Monte Carlo, Ehrenfest dynamics, vibrational analysis, core level spectroscopy, energy minimization, and transition state optimization using NEB or dimer method. CP2K is written in Fortran 2008 and can be run efficiently in parallel using a combination of multi-threading, MPI, and CUDA. It is therefore easy to give the code a try, and to make modifications as needed.
PMX is a service for users who need to do free energy calculations. Free energy calculations are extremely common in life sciences research. In molecular dynamics simulations, such as investigating how mutations affect protein function, these calculations provide insight into stability and affinity changes.
One important branch of free energy calculations involve alchemical transformations such as the mutation of amino acids, nucleic acids or ligand modifications. A challenging aspect of these calculations is the creation of associated structures and molecular topologies. pmx provides an automated framework for the introduction of amino acid mutations in proteins. Several state of the art force fields are supported that can be used in the GROMACS molecular dynamics package.
HADDOCK is a versatile information-driven flexible docking approach for the modelling of biomolecular complexes. HADDOCK distinguishes itself from ab-initio docking methods in the fact that it can integrate information derived from biochemical, biophysical or bioinformatics methods to enhance sampling, scoring, or both. The information that can be integrated is quite diverse: interface restraints from NMR or MS, mutagenesis experiments, or bioinformatics predictions; various orientational restraints from NMR and, recently, cryo-electron maps. Currently, HADDOCK allows the modelling of large assemblies consisting of up to 6 different molecules, which together with its rich data support, provides a truly integrative modelling platform.