Why Use Docker In Machine Learning? We Explain With Use Cases

Source:analyticsindiamag.com

Docker is everywhere in the software industry today. Mostly popular as a DevOps tool, Docker has stolen the hearts of many developers, system administrators and engineers, among others.

What is Docker?

“Docker is a tool that helps users to exploit operating-system-level virtualisation to develop and deliver software in packages called containers.

This technical definition may sound complicated, but all you need to know is that Docker is a complete environment where you can build and deploy software. It is just like your Linux machine, except that it is very lightweight, fast and has nothing except what you need for your project or software to run without a glitch.

One of the best things about Docker is that you can move it across platforms and still run without installing a single dependency — because all you need is a Docker Engine.

It gives developers, system administrators, DevOps people and many others the peace of mind of not having to worry about an application not working when moved to different platforms.

Now that we know what Docker is, let’s see how it is a big help in the machine learning spectrum.

Docker For ML Development

One of the simplest and most common uses of Docker not just in ML but across the entire development phase — it acts as a complete and solid development environment. We all know that setting up a development environment can be very messy at times, especially the fact that resolving dependencies can be a hectic task. But with Docker, it is as simple as a ‘Docker run’ command in most cases. In the worst-case scenario, we will have to build a custom Docker image with all the dependencies, with a little bit of shell programming knowledge it can be achieved in a matter of a few lines of script.

Docker file is an executable file that allows us to create our own custom Docker images. We can specify what the environment should consist of, all the dependencies and software that needs to be installed etc. After that, just execute a couple of commands and your development environment is ready to be spun up.

This would save a lot of time compared to trying to untangle the dependency errors caused by the different versions of applications in your system.

A good and simple example use case can be when you want to build a simple classifier, and your system has a version of libraries and applications that are not updated and can not be updated since you have some applications that run on these versions. The easiest way to deal with it without messing up your current working set-up is by downloading or building a Docker Image with an environment that has the latest version of all the applications and libraries that you would need for building your new classifier or application.

Besides, many applications and software now come with official Docker Images which makes it effortless to install that application into a local machine.

Check out the official documentation to build a Docker image here.

Docker for deploying ML Applications

Another popular use case of Docker is in the deployment phase. Conventional deployment methods would require setting up the production environment to match the testing and development environments so that the application runs seamlessly without any trouble across all the instances. But this is not an easy-to-achieve task.

It is very much possible to miss some dependencies in any environment due to a number of reasons. In production, it might even be too late to realise that there have been missing dependencies.

With Docker, it’s all just a matter of executing a couple of commands to spin up a container. Docker containers can be shipped across platforms, the very reason they are called containers.It means that a program that runs on a Windows development server can never fail to run in a Linux production server. This adds a lot of flexibility in the development life cycle of an application.

If there is a new version of an application that uses new set of dependencies and has to be moved to production, just port the container of the new application to the production server and run it there, as easy as that, and what’s more impressive is, you can run any number of containers in a machine depending on its availability of resources. Which means you can deploy your new vision without interfering with the one already serving as production.

Docker For Distributed Computing Or Cloud

We are all pleased by the advantages offered by running a single container but when multiple Docker containers are combined they provide much more.

Clustering is a signature feature of Docker containers. Docker offers a solution to combine the collective power of containers called the Docker Swarm. Even Google’s Kubernetes is made for the same reason. This makes it possible to spin up containers across multiple machines or over the cloud and manage all of them collectively.

For ML application specifically, this can be a huge advantage since the application can be broken into modules and deployed across different machines all communicating with each other and managed by a single interface. This will add flexibility and scalability as more machines or containers can be added effortlessly.

Containers have built-in solutions that allow external and distributed data access which can be used to leverage common data-oriented interfaces that support many data models.

Docker With GPU

ML cannot be compared with a normal software development task. It is an extremely expensive task, both resource-wise and time-wise.

GPUs are one of the greatest power sources in High-Performance Computing (HPC). As one of the best GPU manufacturers and the leading one in providing HPC solutions for Deep Learning applications, NVIDIA has made a footprint, the CUDA framework, in the ML development space that many follow.

NVIDIA-Docker is a Docker solution for NVIDIA’s popular CUDA framework which can help in maximum resource utilization on the GPU. It provides a runtime that mounts the underlying NVIDIA driver to a container which is totally independent of the version of CUDA that is installed in the machine. When run on a node GPU utilization becomes easy for Docker-containers thus enabling out resource greedy deep learning algorithms to run on large datasets without wasting any resources supplied by the GPU.

In A Nutshell

Docker is employed in various phases of ML development life cycle such as data gathering, data aggregation and preprocessing, data exploration, model training and predictive analysis, application deployment and more. It is a break-through technology that is not limited to Enterprise level usage or application. The community edition is free and one can easily set up and learn to use Docker. In fact, one of the easiest ways to set up a personal development environment is through Docker.