Docker Image

Intro

A Docker image is a lightweight, standalone, and executable software package that contains everything needed to run a piece of software, including the code, runtime, system tools, system libraries, and dependencies. It is a portable and self-sufficient unit that encapsulates an application or service.

Images in Docker follow a layered architecture. This layered approach allows for efficient storage and sharing of images. When a Docker image is built, only the layers that have changed since the previous build need to be rebuilt, saving time and resources. We will see more details about this layer structure and then we can create a manifest of each layers to build the image. This manifest is called Dockefile.

Docker images are stored in a Docker registry, such as Docker Hub or a private registry. They can be easily shared, distributed, and pulled onto different environments, allowing for consistent deployment across different systems. When a Docker image is run, it creates a Docker container, which is an isolated and lightweight runtime instance of the image.

Using Docker images provides benefits such as portability, reproducibility, scalability, and consistency across different environments, making it easier to develop, deploy, and manage applications and services.

In this section, we will explore all these concepts and start working with Docker image cli.

Docker image on surface

During previous sessions, we learned how to run a container and connect to it. We also saw how to execute commands and make changes within the running container.

In this section, we will delve into creating a new image from a running container. We will explore the finer details of Docker images and ensure that we are well-prepared to create a custom image using the Dockefile manifest.

As a high-level definition, a Docker image is a lightweight, standalone, and executable software package that contains everything needed to run a piece of software, including the code, runtime, system tools, system libraries, and dependencies. It is a portable and self-sufficient unit that encapsulates an application or service.

Docker registry

A Docker registry is a central storage and distribution system for Docker images. It serves as a repository where Docker images can be stored, versioned, and shared among different users and systems.

A remote Docker registry is a registry that is hosted on a remote server or cloud platform. Examples of remote Docker registries include Docker Hub (the default public registry), Amazon Elastic Container Registry (ECR), Google Container Registry, and others. Remote registries provide a convenient way to access and share Docker images globally. They allow users to pull and push images to/from a central repository that is accessible over the network. Docker engine by default tries to download images from Docker hub.

On the other hand, a local Docker registry is a registry that is hosted on your local machine or on a local network. It is typically used for private or internal image distribution within an organization. Local registries provide a way to store and manage Docker images within your local infrastructure, ensuring better control and security over the images.

We will see later how to create our own Docker registry.

Pulling image

The docker pull command is used to download Docker images from a container registry or repository. Here's how it works:

When you run the docker pull command, you specify the name of the image you want to download, along with any optional tags or versions. For example: docker pull nginx or docker pull ubuntu:latest.
Docker searches for the specified image in the local Docker engine's cache. If the image is found locally and matches the requested version or tag, Docker uses that image directly. If the image is not found locally or if you specifically want to download a fresh copy, Docker proceeds to the next step.
Docker contacts the container registry specified in the image name (e.g., Docker Hub, a private registry, or another publicly available registry) to check if the requested image exists there.
If the image is found in the registry, Docker starts downloading the image layers. Docker images are composed of multiple layers, which are incremental changes on top of each other. These layers are downloaded and assembled to create the complete image on your local system.

Once the image is fully downloaded and assembled, it is stored in the local Docker engine's cache for future use. Docker can improve performance and efficiency when working with images by local storage mechanism called Docker image cache. When you download images (pull) or build images, the downloaded image layers are stored in the image cache on your local machine.

The image cache serves two main purposes:

Faster image retrieval: Once an image layer is downloaded or built, it is stored in the cache. If you need to use the same image again in the future, Docker can retrieve the layers from the cache instead of downloading or rebuilding them. This greatly speeds up the process of creating new containers based on already downloaded images.
Layer reuse: Docker image layers are reusable across different images. If multiple images share the same base layers, those layers are stored in the cache and can be reused across different images. This saves disk space as well as download or build time, as the common layers don't need to be duplicated for each image.

The image cache is managed by the Docker engine and is located in a specific directory on your local machine. By default, Docker keeps the cache at /var/lib/docker on Linux systems.

You can check list of images by running below commands:

docker images 
OR
docker image ls

The cache can grow in size as you pull or build more images, so it's important to periodically clean the cache to reclaim disk space if necessary. You can delete unused images using the below command:

docker rmi <image-id>

As an image may consist of multiple layers, you can remove all unused cache layers by following these steps:

docker image prune -a

Tacking snapshot

Considering that you have run a container with custom configurations, you may now want to create a new container with the same configuration. The docker commit command in Docker is used to create a new image from changes made to a running container. It allows you to capture the current state of a container and save it as a new image that can be used to create new containers in the future.

The basic syntax of the docker commit command is as follows:

docker commit [OPTIONS] CONTAINER [REPOSITORY[:TAG]]

Now, you can use the new created image to run your custom container. But what if you need to run the container in a different environment or if a colleague asks you to run the same container?

The docker save command in Docker is used to save one or more Docker images as a tar archive. This command allows you to export Docker images from your local system so that they can be stored or transferred to another machine or shared with others. You can simply store the image as a tar file using the below command:

docker save -o image.tar.gz <image_id>

Now, to facilitate sharing, you can share the image tar file. In the other environment, you can easily load the image into Docker by using the following command:

docker load -i <path/to/image.tar.gz>

In Docker, the docker import and docker export commands are used to import and export containers as file system snapshots, respectively. Read more about docker import and docker export!

Docker image deep dive

In the Docker volume section, we introduced Union File System (UnuinFS). Infact, UnionFS is used as the underlying file system mechanism for Docker images.

Union File System (UnionFS) is a Linux kernel technology that allows for the merging of multiple file systems into a single unified view. It enables the creation of layered file systems by stacking multiple file systems on top of each other. Each layer appears as a separate directory, but the files and directories from lower layers are also accessible.

Here are some key concepts related to UnionFS:

Layers: UnionFS operates using a layered approach. Each layer represents a separate file system. When accessing a file, UnionFS scans through the layers from top to bottom until it finds the file or reaches the bottom-most layer. This layering allows for file systems to be combined while preserving the original content.
Copy-on-Write (CoW): UnionFS employs a copy-on-write mechanism to handle modifications to files. When a file is modified in an upper layer, UnionFS creates a new copy of the file in that layer without modifying the original file in the lower layer. This ensures that changes are isolated and don't affect lower layers. It also optimizes storage by only storing modified or new files in the upper layers.
Read-only Base Layer: The bottom-most layer in a UnionFS stack is typically a read-only base layer that contains the original file system or image. It remains unchanged, while new layers are added on top for modifications or additions.
OverlayFS: OverlayFS is a specific implementation of UnionFS in the Linux kernel. It provides a union mount that allows the merging of multiple directories into a single virtual directory. OverlayFS is the most commonly used UnionFS implementation in recent versions of Linux and is utilized by Docker for managing volumes.

Let's simulate a Docker image using OverlayFS involves creating a layered file system manually.

OverlayFS

Create three directories: /lower for the lower layer (base directory), /upper for the upper layer (container directory), and /merged for the merged view.

sudo mkdir /lower
sudo mkdir /upper
sudo mkdir /merged

Create a text file in /lower and put a sample text: Here is the lower layer. Then create a another file in /upper and put a sample text: Here is the upper layer.

Mount the OverlayFS with the lower and upper directories:

sudo mount -t overlay overlay -o lowerdir=/lower,upperdir=/upper,workdir=/merged /merged

This command mounts the OverlayFS and uses /lower as the lower directory, /upper as the upper directory, and /merged as the merged view.

Navigate to the /merged and check the result.

This is a simplified simulation and does not cover all the functionalities and features of Docker images. Docker images include various metadata, layer management, and compression mechanisms, which are not replicated in this manual simulation. This simulation merely demonstrates the basic concept of layering file systems using UnionFS, similar to how Docker images are structured.

Practice your knowledge

You can practice these tasks in a terminal or command prompt with Docker installed on your machine. Make sure you have Docker properly installed and configured before starting the practice scenario.

Remember to refer to Docker's official documentation for detailed information on each command and its usage. Happy learning!

Scenario

Task 1: Run an Ubuntu container

Run a container with the the latest version of Ubuntu image. Export the below environment variables:

VAR1=ubuntu
VAR2=webserver

Map port 3000 of host to the container, and finally call the container ubuntu001.

Task 2: Customize the container

Try to intall the below packages:

vim
Python
Nginx

Create a custom Nginx configuration to listen on port 3000, and show a custom welcome screen (Trying to copy index.html from host to the container).

Task 3: Docker commit

se the docker commit command to create a new image from the modified container, and verify the image is available.

Task 4: Docker save

Next, use the docker save command to export the newly created image as a tar file, and share it with another colleague.

Task 5: Run a new continer

Load the new image and run a new container. Navigate to the http://localhost:3000 to verify the new container.