Biogen: Data Science and Docker Swarm

Apr 25 2016

written by Theo Platt, Associate Director, Biogen and Karl Gutwin, Senior Data Architect, Biogen

The Data Sciences department at Biogen has been using Docker and watching the (r)evolution for a couple of years. Last year, as our experience with Docker grew and the use cases expanded, we built our own early Docker Swarm cluster with homegrown orchestration capabilities.

Through cutting-edge science and medicine, Biogen discovers, develops and delivers worldwide innovative therapies for people living with serious neurological, autoimmune and rare diseases. Founded in 1978, Biogen is one of the world’s oldest independent biotechnology companies and patients worldwide benefit from its leading multiple sclerosis and innovative hemophilia therapies.

The Data Science team within Biogen is tasked with discovering new insights from rich and complicated data sets spread across all aspects of Biogen’s operations. Sometimes a data scientist unlocks a simple one-off insight, but on other occasions the insight sparks another project to visualize data, continually analyze streams or repeatedly apply algorithms. This is where the data engineers step in, taking those insights and transforming them into proof-of-concepts or minimally viable products.

Every project is different and the data engineers have to be able to quickly build a new stack of tools. Sometimes the project only requires a Neo4j graph store with a visualization or query interface; other times it would be a full stack such as MongoDB, ElasticSearch, MuleSoft, APIs and JavaScript based UIs. The projects have a short shelf life, are aimed at a small target audience and, most often, only require limited support. However, to follow best practices of Agile development with Continuous Integration and Deployment, we prefer deploying multiple environments per application (dev, test, prod etc.)

A few years ago, in order to achieve this, we would provision one Virtual Machine (VM) per component and write a Chef script to install and configure the machine. In late 2014, we had application stacks using almost 10 VMs per environment, which multiplied by several environments led to an excessive operations burden for our proof-of-concept apps.

At that time, we started exploring replacing VMs with Docker containers. We began to build entire application stacks on single VMs running Docker using a custom deployment script. Data Engineers could now script entire stacks, deploy on their laptops, push to dev/test/prod and start to share common patterns between projects. Additionally, this reduced the burden to the operations group by allowing the team to move to a more devops model.

We now had a better toolset but there were still some lingering issues to be addressed. We still had to request multiple (albeit fewer) VMs per project; the given DNS names for the VMs were nonsense to the end user (and requesting friendlier ones was another procedural step); we couldn’t easily scale out beyond a very large VM, and some projects were beginning to need bare metal performance.

docker_swarm To tackle our evolving needs we started to build a Swarm cluster from a handful of high-end servers. Combining our deployment script base with Swarm, dnsdock, Hashicorp Vault, Docker UI, and other components, we assembled a running cluster. By eliminating the need to size and request multiple VMs for a proof-of-concept project, engineers script an application stack and deploy almost immediately thus reducing the turnaround time on projects. As the engineers have more control over the stack we have witnessed best practices emerge on how to solve for particular problem spaces.

While some projects use the Swarm infrastructure, other projects use single VMs running Docker. Since we have set up the Docker environment last year many new features have shipped that we are looking to add as standard components into our environment.

The Data Science group at Biogen continues to work at the leading edge of technology to support the mission of delivering innovative therapies and we are always looking for talented engineers to join the team.

To learn more about Biogen, head to For more information about Docker Swarm, watch this whiteboard video series and set up your first cluster


Learn More about Docker


0 thoughts on "Biogen: Data Science and Docker Swarm"

DockerCon 2022

Registration is now open for DockerCon 2022! Join us for this free, immersive online experience complete with product demos, breakout learning tracks, panel discussions, hacks & tips, deep dive technical sessions, and much more.

Register Now