Using Storage in Docker

Overview

There are two main categories of data:

  • persistent
  • non-persistent

Every Docker container gets its own non-persistent storage. This is created automatically and is tightly coupled to the container’s lifecycle.

To persist data, a container needs to store it in a volume. Volumes are separate objects whose lifecycles are decoupled from containers.

The Union File System

In Docker, the Union File System allows files and directories of separate file systems, known as branches, to be transparently overlaid, forming a single coherent file system. Contents of directories which have the same path within the merged branches are seen together in a single merged directory, within the new, virtual filesystem.

Non-Persistent Data

Every Docker container is created by adding a thin read-write layer on top of the read-only image on which it’s based. The writable layers exist in the filesystem of the Docker host: /var/lib/docker/<storage-driver>.

Any data written to this layer is deleted when the container is deleted.

This writable layer of local storage is managed on every Docker host by a storage driver.

When you use the COPY instruction in a Dockerfile, the files and directories you copy into the image are there when you run a container from the image.

Using tmpfs Mounts

If you’re running Docker on Linux, you can use tmpfs mounts. When you create a container with a tmpfs mount, the container can create files outside the container’s writable layer.

A tmpfs mount is temporary and persisted only in the host memory. When the container stops, the tmpfs mount is removed.

This is useful to temporarily store sensitive files you don’t want to persist in either the host or in the container’s writable layer.

The main limitations of tmpfs are:

  • You can’t share tmpfs between containers.
  • This functionality is only available if you’re running Docker on Linux.

To use a tmpfs, use the following format:

docker run -d \
  -it \
  --name tmptest \
  --mount type=tmpfs,destination=/app \
  nginx:latest

There is no source for tmpfs mounts.

Sharing Local Storage Between Containers

You can share local storage between containers with the --volumes-from option in the docker run command. For example:

docker run -it --volumes-from first-container --name second-container ubuntu bash

Copying Files Between Container and the Local Machine

To copy files between containers, use:

docker container cp <container-name:/path/filename> <filename>

For example:

docker container cp rn1:/random/number.txt number1.txt

Modifying Images in Containers

A container can edit existing files from the image layers. However, image layers are read-only, so Docker uses a copy-on-write process. When the container tries to edit a file in an image layer, Docker makes a copy of that file into the writeable layers, and the edit happens there.

Modifying the file in the container affects how that container runs, but it doesn’t affect the image or any other containers from that image. The changed file only lives in the writeable layer for that one container. Any new containers use the original image.

If you want to commit information to the image before pushing it to a repo, you must use a filesystem. For example:

docker run -it -v /vol1 --name file_container ubuntu bash
mkdir new && cd new 
date > file1
exit
docker commit file_container file_image
docker run -it file_image

Storage Drivers

Storage drivers are sometimes known as graph drivers. The appropriate storage driver often depends on your OS:

  • overlay2: current Ubuntu and CentOS
  • aufs: Ubuntu 14.04 and older
  • devicemapper: CentOS 7 and earlier.

Configuring DeviceMapper

DeviceMapper is one of the Docker storage drivers available for some Linux distributions.

You can customize your DeviceMapper configuration using the daemon config file.

DeviceMapper supports two modes:

loop-lvm mode:

  • Loopback mechanism simulates an additional physical disk using files on the local disk.
  • Minimal setup, doesn’t require an additional storage device.
  • Bad performance, only use for testing.

direct-lvm mode:

  • Stores data on a separate device.
  • Requires an additional storage device.
  • Good performance, use for production.

Using Bind Mounts

Bind mounts are an easy way to get data from your host onto a container. For example, you could run a Jekyll container and mount the static files from your host.

A bind mount maps an existing host file or directory to a container file or directory. Essentially, it’s just two locations pointing to the same file(s). Bind mounts skip UFS, and host files replace any in the container. Once the bind mount is removed, the container’s files are used again.

You can’t create a bind mount in a Dockerfile, only with a docker container run command. For example:

docker container run -v /users/username/stuff:/path/on/container

Using Volumes

Volumes make a special location outside of a container’s UFS.

Volumes are the recommended way to persist data in containers. Here’s the process:

  • Create a volume.
  • Create a container and mount the volume into it.
  • The volume is mounted into a directory in the container’s filesystem.
  • Anything written to that directory is stored in the volume.
  • If you delete the container, the volume and its data still exist.

Persistent data can be managed using several storage models.

Storage Models

Filesystem storage

  • Data stored in form of a file system.
  • Used by overlay2 and aufs
  • Efficient use of memory
  • Inefficient with write-heavy workloads.

Block storage

  • Stores data in blocks.
  • Used by devicemapper.
  • Efficient with write-heavy workloads.

Object storage

  • Stores data in an external object-based store.
  • Application must be designed to use object-based storage.
  • Flexible and scalable.

You can also deploy volumes via Dockerfiles using the VOLUME instruction: VOLUME <container-mount-point>. You cannot specify a directory on the host when defining a volume in a Dockerfile. This is because host directories differ according to the OS on which your Docker host is running. Consequently, defining a volume in a Dockerfile requires you to specify host directories at deploy-time.

Creating Volumes

To create a volume, use the following command:

docker volume create <volume-name>

By default, Docker creates new volumes with the built-in local driver. As the name suggests, volumes created with the local driver are available only to containers on the same node as the volume. You can use the -d flag to specify a different driver.

Third-party volume drivers are available as plugins. Once the plugin is registered, you can create new volumes from the storage system using docker volume create with the -d flag.

Use docker volume inspect to see what driver it’s using and where the volume exists.

All volumes created with the local driver get their own directory under /var/lib/docker/volumes on Linux. This means you can see them in your Docker host’s filesystem.

Mounting a Volume

To mount a volume to a container, use the following command:

docker container run -d --name <container-name> -v <vol-name>:</var/lib/path> <image-name>

If you specify a volume that doesn’t exist, Docker creates it for you. However, when you create a volume with a docker run command, you can’t add custom drivers or labels.

When using images that require a specific volume, you can find this information on Docker Hub. For example, postgres needs a VOLUME path of /var/lib/postgresql/data:

docker run -d --name postgres -v my-db:/var/lib/postgresql/data postgres:9.6.1

Incidentally, when running database containers, you normally need to add a password through an environment variable: -e POSTGRES_PASSWORD=password.

Removing Volumes

To remove a volume, use docker volume rm.

To delete any unmounted volumes, use docker volume prune.

Kubernetes

For the DCA exam, you also need to know about storage in Kubernetes.