CWL

The Common Workflow Language (CWL) is a specification for describing analysis workflows and tools in a way that makes them portable and scalable across a variety of software and hardware environments, from workstations to cluster, cloud, and high performance computing (HPC) environments. CWL is designed to meet the needs of data-intensive science, such as Bioinformatics, Medical Imaging, Astronomy, Physics, and Chemistry. CWL is developed by a multi-vendor working group consisting of organizations and individuals aiming to enable scientists to share data analysis workflows. The CWL project is maintained on Github and we follow the Open-Stand.org principles for collaborative open standards development. Legally, CWL is a member project of Software Freedom Conservancy and is formally managed by the elected CWL leadership team, however every-day project decisions are made by the CWL community which is open for participation by anyone. CWL builds on technologies such as JSON-LD for data modeling and Docker for portable runtime environments.

The Common Workflow Language (CWL) is a community-developed specification for interoperable scientific workflows, supported by multiple workflow engine vendors and open source projects. Started as a third-year project at The University of Manager and further developed as part of the BioExcel project, the CWL Viewer is available to visualize any CWL workflow definitions, show their annotations and composition.

The public CWL Viewer instance has become the de facto standard web visualization tool for workflows within the larger CWL community – the list of known workflows shows more than 2000 individual workflows have been visualized.

In 2017 the CWL Viewer was presented at the ISMB/ECCB conference where it won the F1000 Best Poster Award. The development and hosting of CWL Viewer is now being transitioned to Curii Corporation, an industry partner in the CWL project that is developing the Arvados platform.

CoE: BioExcel

bioBB

A new Graphical User Interface (GUI) on top of these workflow building blocks (and eventually on top of all the building blocks developed in BioExcel) is being developed: biobb web server. This GUI will ease the usage of BioExcel workflows and tools for a big community that is still not familiar with HPC programming, but have a real interest on this topic, which can include pharmaceutical companies (used to work with GUIs) but also entry level users, whose interest is demonstrated in the high number of registered users (~4000) and pipelines run as of today with web server interfaces such as MDWeb. biobb web server will allow users to run a set of chosen, pre-configured workflows built using BioExcel building blocks, such as a structure quality checking, a structure energy minimization, a complete MD setup, or a complete MD simulation (with length restrictions). The GUI will also provide an additional interactivity to our building blocks. A great example is the possibility to run a quality check of a structure, while at the same time a 3D representation of the molecule is shown in the same interface, highlighting the region of the structure of particular interest. This interactivity can be applied also to the set of analyses generated by the workflows.

Workflows will be submitted and treated by a queue manager, which will serve them in an on-demand processing model performed by Virtual Machines automatically deployed in an Open Nebula OneFlow cloud environment. A direct connection to HPC supercomputing infrastructures to submit long molecular dynamics simulations prepared using the portal will be studied.

biobb server will also be the entry point for BioExcel building blocks. The web page will gather all the information on how to obtain, install and run the building blocks and workflows generated by BioExcel: for developers or experts in the field (github, bioconda, biocontainers, VMs/Cloud) for HPC users (environment modules) and for entry level users (Galaxy, KNIME, web server).

CoE: BioExcel