27 - January - 2017

Dev Environments With ZFS and Containers

Post by Andrew A

Development environments are used to develop and test changes before pushing code further through the release pipeline. Giving each developer their own environment allows changes to be made within the codebase without affecting other developers or environments.

Traditionally, these environments would be locally based on the developer’s machine with all of the libraries and programs needed to run the code being installed too. We’ll examine development environments from the perspective of a web developer, so the needed programs would include a web server and a database server.

The Problem

There are two main issues with local development environments: isolation and resources.

Isolation

Before virtual machines (VMs) were common, many developers had a single web server and database server installed on their computer. Multiple sites (and multiple copies of the same site) were separated at the directory level with each copy of the site taking up the same amount of space as its parent.

Having all sites served by the same web and database servers presents a few problems:

  • Server level configuration is applied to every site

  • Software versions are limited at server level

  • Each site could potentially access other sites data

  • Configuration becomes cluttered over time

Once virtualisation became available to desktop hardware, software such as Virtualbox made it possible to create VMs which allowed the developer to have an entire server for each website or for each client. This approach meant that the development environment could now be tailored to match production environments and can be completely isolated from other development environments.

Resources

Before VMs, each site would only take up the code and database’s size of disk space. Generally, as only one web server and one database server ran at a time, the resources of only a single instance of each would be used.

A VM is an entire computer, from the virtual hardware assigned, to the Linux operating system, to the applications such as the web server. This means that each VM needs more resources to run than just a single web server with multiple copies of code. Each VM will be allocated its own memory, CPU and disks - taking up more resources on a developers local machine.

Solving the problem

To solve the problems of isolation and resources, we’ll need to use different technologies to address the individual concerns of each.

To address the issues of isolation and indirectly address resource usage, container technology can be used. Amazon Web Services (AWS) describe containers as being “a method of operating system virtualization that allow you to run an application and its dependencies in resource-isolated processes”. Using containers grants several benefits:

  • Isolation - each container will be isolated from other containers

  • Better resource use - unlike VMs, we don’t need to assign memory and disk space for the operating system

  • Faster - even with VM snapshots, creating new containers is very quick

Even though containers can reduce the disk space usage of development environments compared to VMs, it’s possible to further save space by using a filesystem that supports snapshotting and copy-on-write (COW) resource management. Copy-on-write allows data to be shared until a change is made to that data, at which point, a copy is made. This allows multiple copies of the same code & database to exist without taking up the expected amount of disk space, which is beneficial if a developer wants to use a container to work on several features on the same site at the same time.

Tech used

To demonstrate container technology, Docker will be used. Although there are other container engines available for use, Docker is the currently ubiquitous choice.

ZFS will be used for snapshotting and for copy-on-write.

Although ZFS is available for OSX (https://openzfsonosx.org/wiki/Downloads), at the time of writing, 10.12 does not have a stable version. Ubuntu Xenial will be used instead for the OS.

ZFS

ZFS is a filesystem that was originally designed by Sun Microsystems in the early 2000s. ZFS was designed to solve several problems that filesystems at the time had. It was designed to be easily scalable, have native snapshots for easy backup and restoration of data and check for corruptions via checksumming and if necessary, self heal.  By controlling both of the, traditionally separate, facets of data management - the physical management (such as hard disks) and the file management (a filesystem such as NTFS), ZFS has knowledge of and complete control of everything that makes use of it.

ZFS filesystems are built on top of virtual storage pools called zpools. A zpool is made up of virtual devices that are made up of block devices, for example a disk.

ZFS snapshots

When ZFS writes new data, the blocks containing the old data can be retained, allowing a snapshot version of the file system to be maintained. ZFS snapshots are created very quickly, since all the data composing the snapshot is already stored. They are also space efficient, since any unchanged data is shared among the file system and its snapshots.

Creating filesystems from snapshots

Writeable snapshots ("clones") can also be created, resulting in two independent file systems that share a set of blocks. As changes are made to any of the clone file systems, new data blocks are created to reflect those changes, but any unchanged blocks continue to be shared, no matter how many clones exist. This is an implementation of the copy-on-write principle and is what we’ll be taking advantage of to create new environments.

ZFS example

Installation

Install ZFS

apt install zfsutils-linux

Before creating a ZFS filesystem, we need to create a zpool for it:

zpool create blogpost /dev/sdb

/dev/sdb is a 2GB disk attached to the machine. The zpool will take up the entire available space of the disk.

zpool list shows:

NAME       SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH
blogpost  1.98G    64K  1.98G         -     0%     0%  1.00x  ONLINE

And mount shows:

/blogpost on /blogpost type zfs (rw,relatime,xattr,noacl)

You can change the mountpoint at creation time by passing -m and a path.

To create the filesystem:

zfs create blogpost/master

zfs list will show the filesystem we created and the root filesystem:

NAME              USED  AVAIL  REFER  MOUNTPOINT
blogpost          250K  1.92G    19K  /blogpost
blogpost/master    19K  1.92G    19K  /blogpost/master
Snapshots

To demonstrate the copy-on-write/snapshot features, create some test data:

dd if=/dev/zero of=/blogpost/master/file-1mb.txt count=1024 bs=1024
dd if=/dev/zero of=/blogpost/master/file-2mb.txt count=2048 bs=1024

And now create the snapshot:

zfs snapshot blogpost/master@testsnapshot

zfs list -t snapshot will show snapshots:

NAME                             USED    AVAIL        REFER     MOUNTPOINT
blogpost/master@testsnapshot      0        -         3.02M             -

Clone the snapshot by specifying the clone and the destination filesystem. In this case, we’re taking the clone of master called ‘testsnapshot’ and creating a filesystem called development:

zfs clone blogpost/master@testsnapshot blogpost/development

Listing the files in /blogpost/development will show the files:

drwxr-xr-x 2 root root       4 Jan 10 17:51 ./
drwxr-xr-x 4 root root       4 Jan 10 17:57 ../
-rw-r--r-- 1 root root 1048576 Jan 10 17:51 file-1mb.txt
-rw-r--r-- 1 root root 2097152 Jan 10 17:51 file-2mb.txt

Even though you can see the files and their sizes, only 1k of space is taken up instead of 3MB. The 3MB of data is being referenced rather than existing.

zfs list:

NAME                         USED  AVAIL  REFER  MOUNTPOINT
blogpost/development           1K  1.92G  3.02M  /blogpost/development
blogpost/master              3.02M  1.92G  3.02M  /blogpost/master

After making a change to the 1mb file in /blogpost/development, zfs list now looks like:

NAME                         USED  AVAIL  REFER  MOUNTPOINT
blogpost/development          1.14M  1.92G  3.15M  /blogpost/development
blogpost/master             3.02M  1.92G  3.02M  /blogpost/master

Docker

Docker Example

Installation

(Steps below are for Ubuntu 16.04. See https://docs.docker.com/engine/installation/ for other operating systems)

sudo apt install apt-transport-https ca-certificates
sudo apt-key adv --keyserver hkp://ha.pool.sks-keyservers.net:80 --recv-keys 58118E89F3A912897C070ADBF76221572C52609D
echo "deb https://apt.dockerproject.org/repo ubuntu-xenial main" | sudo tee /etc/apt/sources.list.d/docker.list
sudo apt update
sudo apt-cache policy docker-engine
sudo apt install linux-image-extra-$(uname -r) linux-image-extra-virtual
sudo apt install docker-engine
Usage
docker run -d -p 80:80 tutum/hello-world

The command above will automatically download an image called “hello-world” from the Tutum repository on the docker hub. Once downloaded, it will start the container in detached mode (runs in the background) and publishes port 80 so that we can connect. Once running, opening the page in the browser gives:

Screen Shot 2017-01-11 at 11.28.26.png

Now that the image is downloaded (and already built as it’s stored in the Docker Hub), starting another container is as simple as changing the port and running the command again:

docker run -d -p 8080:80 tutum/hello-world

This will instantly start another container. We can use ‘docker ps’ to see the running containers:
 

CONTAINER ID        IMAGE               COMMAND                     STATUS              PORTS
21def91c5b50        tutum/hello-world   "/bin/sh -c 'php-fpm "     Up 28 seconds   0.0.0.0:8080->80/tcp
fe4b2176f151        tutum/hello-world   "/bin/sh -c 'php-fpm "     Up 39 minutes    0.0.0.0:80->80/tcp

So using two docker commands, we have 2 separate containers running Nginx and PHP - yet only using the process’ amount of resources instead of an entire operating systems, in the case of the VMs:

CONTAINER           CPU %               MEM USAGE / LIMIT
21def91c5b50        0.02%               3.977 MiB / 488.4 MiB
fe4b2176f151        0.01%               4.516 MiB / 488.4 MiB

Putting it all together

Using the above examples, it should be possible to see how containers and a copy-on-write filesystem can be used together in order to create new development environments quickly and cheaply.

The proof of concept techniques used above can be expanded into working with a minimal PHP developer’s setup - Nginx, PHP & MySQL. Docker-compose will be used in order to simplify the defining and running of multiple containers.


Note:

The PHP container uses a custom image which extends the official php-7 image and adds mysql pdo extensions. To create this image, create a file called Dockerfile containing:

FROM php:7-fpm

RUN docker-php-ext-install mysqli pdo pdo_mysql

Running the following will build the image.

docker build -t blogpost-web .

 


Docker-compose reads a docker-compose.yml file to determine what to do. The syntax is similar to that of a normal Dockerfile. Create a file in /blogpost/master called docker-compose.yml and use the following:

web:
   image: nginx:latest
   ports:
       - "80:80"
   volumes:
       - ./code:/code
       - ./site.conf:/etc/nginx/conf.d/site.conf
   links:
       - php

php:
   image: blogpost-web
   volumes:
       - ./code:/code
   links:
       - db

db:
 image: mysql
 volumes:
   - ./database:/var/lib/mysql/
 ports:
   - "3306:3306"
 environment:
   MYSQL_ROOT_PASSWORD: master
   MYSQL_USER: master
   MYSQL_PASSWORD: master
   MYSQL_DATABASE: master

docker-compose up will download the images (if needed) and start the containers.

Test Data

In order to test the functionality, a small file is placed in the code directory that connects to the DB and prints out the contents of the messages table. An example is:

$db = new PDO('mysql:host=db;dbname=master;charset=utf8mb4', 'master', 'master');
echo("I am master<br/><br/>Messages:</br></br>");
foreach($db->query('SELECT * FROM messages') as $row) {
   echo $row['message'];
}

The disk space usage now looks like:
 

NAME                       USED  AVAIL  REFER  MOUNTPOINT
blogpost/master/code        20K  1.72G    20K  /blogpost/master/code
blogpost/master/database   210M  1.72G   210M  /blogpost/master/database

So the database is currently taking up 210MB. Traditionally, if we wanted to duplicate the database for another branch/feature, we’d copy the directory, thereby increasing the disk space usage to 420MB.

To snapshot the master branch, use the following commands:

zfs snapshot blogpost/master/database@masterdatabase
zfs snapshot blogpost/master/code@mastercode

Then create the development filesystem and then clone the master snapshots to the development filesystem:

zfs create blogpost/development
zfs clone blogpost/master/code@mastercode blogpost/development/code
zfs clone blogpost/master/database@masterdatabase blogpost/development/database

zfs list now shows the master & development filesystems:

NAME                            USED  AVAIL  REFER
blogpost/master/code             20K  1.72G    20K
blogpost/master/database        210M  1.72G   210M
blogpost/development/code         1K  1.72G    20K
blogpost/development/database     1K  1.72G   210M

The code and database for the development branch is only taking up 2KB disk space instead of 210MB! Quite a saving if used with large databases and codebases.

Running docker-compose up in the development directory brings up the 3 containers. Edit the test PHP and replace ‘master’ with ‘development’, so that we can prove which codebase is being used.

When visiting the master site, you should see

Screen Shot 2017-01-11 at 16.46.24.png

And when visiting the development site, you should see:

Screen Shot 2017-01-11 at 16.46.28.png

zfs list shows:
 

NAME                            USED  AVAIL  REFER
blogpost/master/code             29K  1.70G    20K
blogpost/master/database        210M  1.70G   210M
blogpost/development/code      9.50K  1.70G    20K
blogpost/development/database  12.6M  1.70G   210M

So although some extra space has been used, nearly 200MB is still saved even though an entirely new development environment now exists.

Get a shell on the development DB container by running: docker exec -it development_db_1 /bin/bash

And insert some data:

mysql -umaster -pmaster master -e ‘insert into messages (message) values ("This is the development branch");’

Refreshing the page on your development web server should now show another message:

Screen Shot 2017-01-11 at 16.56.44.png

NAME                            USED  AVAIL  REFER
blogpost/master/code             29K  1.70G    20K
blogpost/master/database        210M  1.70G   210M
blogpost/development/code      9.50K  1.70G    20K
blogpost/development/database  12.8M  1.70G   210M

So even with a change in the database, the majority of the data is referenced from the snapshotted data with only 12.8MB taking up space on the disk.

Conclusion

The examples above have shown that it is possible to create quick andresource effective development environments locally. We have shown how containers can be used to isolate environments which provide a layer of security and cleanliness whilst still saving space and time when compared to traditional VMs. We have also shown how using a copy-on-write filesystem such as ZFS can be used to quickly and easily clone environments and reduce disk space usage, allowing a developer to have more environments than a traditional setup might allow.

The commands used can easily be scripted to make the process interactive and even faster.

Profile picture for user Andrew A

Andrew A

Comments

Good article.

Just want to add that ZFS isn't especially stable on Linux yet but TrueOS (FreeBSD) has been using ZFS as its default FS for some time. TrueOS based on FreeBSD is arguably a more stable OS than Linux anyway. I've been using it for years (previously known as PCBSD). www.trueos.org
TrueOS can create chroot environments (jails) easily as well, walled off from the main OS.

druplicate 28th January 2017

Thanks for this, it's the closest example I've found of somebody setting up developer environments exactly how I'd imagined they ought to work.

We've got a 500GB+ database which currently all developers share one copy of, so I was hoping to use ZFS clones of that database to allow every developer to have writable access to their own sandboxed view of that database to stop them constantly breaking each other's work!

beveradb 27th April 2017

Add new comment

Share this article

Our thoughts

Let's work together

Get in touch and find out how we can empower your organisation.
Back to top