Samuel Cozannet
on 7 March 2017
GPUs and Kubernetes for deep learning — Part 2/3: Adding storage
Earlier this week we built a GPU cluster and installed Kubernetes so that we can do some advanced data processing.
What is the thing you need next right after you have GPUs? Data. Data. and Data. And technically, if you looked at any of the tutorials for Tensorflow or the recent PaddlePaddle blog posts, you’ll have seen that every one of them start with running a Kubernetes job to download and prepare a dataset.
You can’t store in containers directly, but there are many ways of providing a storage layer to Kubernetes pods
- On Premises options: Ceph RDB, GlusterFS, Fiber Channel, iSCSI, vSphere Volume, Quobyte Volumes, VMWare Photon, or simply NFS (and probably others I forgot)
- In the cloud: any of the block primitives of the clouds such as AWS EBS, Google Persistent Disks, Azure Storage (File or Disk)
- Both: HostStorage
In June 2016, AWS made EFS (Elastic File System) generally available. EFS is a NFS4-compliant storage layer, which can extremely easily be leveraged to provide multiReadWrite PVs to Kubernetes pods. As it is elastic, it is a very nice option for datasets storage, which size you cannot always predict when they are remotely hosted. It can scale up to 52TB for a single file. And it doesn’t require funky IAM Roles to connect to it, just an NFS client and a few rules you can apply post-deployment.
TL;DR: it’s worth looking at EFS. Even more in our context of Kubernetes for science.
Let us see how we can add and Elastic File System to our GPU cluster for this part 2, before getting to the real stuff in Part 3. We could just copy paste a tuto here, but, as far as I know, they all do it with the UI, which takes ages, and cannot be automated. I want to automate all the things.
The Plan
In this blog, you will:
- Re-use the cluster we deployed last time
- Programmatically add an EFS File System and Mount Points into your nodes
- Verify that it works by adding a PV / PVC in k8s.
Requirements
For what follows, it is important that:
- You understand Kubernetes 101
- You have admin credentials for AWS
- You read Part 1 and have the cluster ready at your fingertips
- You eat json and jq for breakfast. Every day.
Foreplay
There is none, it is aaaaall in the previous post. Have a look, deploy and come back.
Just this thing. You copied the post last time, here is the folder for this section:
cd ../k8s-using-efs
Ready? Let’s add this storage and get going!
Adding and connecting an EFS Storage Backend
Pre requisite
In order to manage EFS, you will need:
- the region where you deploy. In our case, us-east-1
- the VPC you run into. Juju uses the default VPC by default, so you can look that up in AWS GUI, or if you did not deploy more VPCs with
REGION=us-east-1
VPC_ID=$(aws --region us-east-1 ec2 describe-vpcs \
| jq --raw-output '.[][].VpcId')
# vpc-b4ce2bd1
- All subnets you have instances deployed in
SUBNET_IDS=$(aws --region us-east-1 ec2 describe-subnets \
| jq --raw-output '.[][].SubnetId')
# subnet-26300e52
# subnet-418dea7b
# subnet-0cc0984a
# subnet-645b204c
- The security group to allow access from
SG_ID=$(aws --region us-east-1 ec2 describe-instances \
| jq --raw-output '.[][].Instances[] \
| select( .Tags[].Value \
| contains ("juju-default-machine-0")) \
| .SecurityGroups[1].GroupId')
# sg-bc4101c0
Here we cheat a little bit. For each model you create, Juju will create a Security Group, which will be named with the format juju- where uuid is a randomly generated 32 char UUID. Then each machine will inheritate a secondary security group with the format juju–, where uuid is the same as the generic SG, and id is the machine ID in the model.
As a result, straight after a deployment, each Juju machine will only have 2 SGs. We use this property and the default sorting method of jq to extract the correct value.
Actually, now that we talk about it, there is a pretty unknown but useful feature in Juju: you can custom tag your instances (therefore make the above much simpler)
OK, we have the necessary data, let’s create our FS.
Creation of the EFS
Create and store the EFS ID with:
EFS_ID=$(aws canonical --region us-east-1 efs create-file-system --creation-token $(uuid) \
| jq --raw-values '.FileSystemId'
# fs-69de7c20
# Note the full output:
# {
# "CreationToken": "f2513790-f2c8-11e6-a002-9735d97703bd",
# "LifeCycleState": "creating",
# "OwnerId": "131768076974",
# "FileSystemId": "fs-69de7c20",
# "NumberOfMountTargets": 0,
# "PerformanceMode": "generalPurpose",
# "CreationTime": 1487085556.0,
# "SizeInBytes": {
# "Value": 0
# }
# }
Now you need to create mount points for each of the subnets you have instances in:
for subnet in ${SUBNET_IDS}
do
aws --region us-east-1 efs create-mount-target \
--file-system-id ${EFS_ID} \
--subnet-id ${subnet} \
--security-groups ${SG_ID}
done
# {
# "LifeCycleState": "creating",
# "IpAddress": "172.31.24.140",
# "OwnerId": "131768076974",
# "NetworkInterfaceId": "eni-8c0bb86c",
# "MountTargetId": "fsmt-d900b590",
# "SubnetId": "subnet-26300e52",
# "FileSystemId": "fs-69de7c20"
# }
# {
# "OwnerId": "131768076974",
# "NetworkInterfaceId": "eni-265845ce",
# "SubnetId": "subnet-418dea7b",
# "FileSystemId": "fs-69de7c20",
# "LifeCycleState": "creating",
# "MountTargetId": "fsmt-da00b593",
# "IpAddress": "172.31.63.172"
# }
# {
# "IpAddress": "172.31.6.243",
# "NetworkInterfaceId": "eni-8d930649",
# "SubnetId": "subnet-0cc0984a",
# "FileSystemId": "fs-69de7c20",
# "OwnerId": "131768076974",
# "MountTargetId": "fsmt-dc00b595",
# "LifeCycleState": "creating"
# }
# {
# "SubnetId": "subnet-645b204c",
# "FileSystemId": "fs-69de7c20",
# "NetworkInterfaceId": "eni-17f676e5",
# "OwnerId": "131768076974",
# "IpAddress": "172.31.44.34",
# "LifeCycleState": "creating",
# "MountTargetId": "fsmt-df00b596"
# }
And you are now ready to add storage in CDK. Before that, let’s see how we would consume it from an existing or new machine we control.
Consuming from a different host
To consume EFS from a node in one of the above subnets you need to do (on the node):
sudo apt install -yqq nfs-common
mkdir efs
sudo mount -t nfs4 -o nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2 EFS_SERVICE_HOST:/ efs
where EFS_SERVICE_HOST is ${FS_ID}.${REGION}.amazonaws.com (in our example fs-69de7c20.us-east-1.amazonaws.com)
That’s it, not very hard. Let’s see for CDK now.
Connecting EFS and CDK
To consume storage like EFS in Kubernetes, you need 2 primitives:
- Persistent Volumes (PV), which define large (network) storage allocations that can be sliced into separate sub-volumes. Think of this as a disk that you could partition essentially. PVs are managed by the administrator.
- Persistent Volume Claims (PVC): this is a chunk / partition of a PV that is requested to be allocated to one or more pods. PVC are consumed by the users of the cluster.
PVCs can have different Read/Write properties:
- ReadOnlyMany: like secrets, can be read by many pods, but no one can write.
- ReadWriteOnce: only one pod can write, but many can read. If you have a master/agent deployment for example
- ReadWriteMany: everyone can read and write. If you have to exchange files between different micro services, this is a simple way of doing so
A manifest for a PV looks like
apiVersion: v1
kind: PersistentVolume
metadata:
name: efs-volume
spec:
capacity:
storage: 100Gi
accessModes:
- ReadWriteMany
nfs:
server: EFS_SERVICE_HOST
path: "/"
where EFS_SERVICE_HOST is ${FS_ID}.${REGION}.amazonaws.com. Here we can see that this PV uses the NFS primitive to consume EFS.
Luckily in CDK, all worker nodes have nfs-common installed by default, hence this works out of the box, you don’t have to worry about it.
You can create this PV with the classic
kubectl create -f src/pv.yaml
One PVC consuming this could be:
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: efs-volume
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 50Gi
Our PVC requests 50GB out of the 100 that we provisionned earlier. Note the vocabulary or requesting resources in the PVC, while the PV has a given capacity.
You can create this PVC with
kubectl create -f src/pvc.yaml
At this point what you have in your cluster is the ability for pods to claim this PVC, then access it to read/write stuff.
Consuming the PVC
PVCs are consumed as mount points by pods. Below is an example of job that has a pod consuming a PVC:
apiVersion: batch/v1
kind: Job
metadata:
name: data-consumer
spec:
template:
metadata:
name: pi
spec:
containers:
- name: data-consumer
image: super-container:latest
imagePullPolicy: Always
volumeMounts:
- mountPath: "/efs"
name: efs
volumes:
- name: efs
persistentVolumeClaim:
claimName: efs-volume
restartPolicy: Never
From there, I will let your imagination going. A little help: Kubernetes has the pods to define apps, and it makes sure that those pods are maintained alive if you want.
But it also has a notion of Jobs. Jobs are essentiall the run once of Mesos: batch jobs that run once or as cron. A practice when you write Helm Packages or workshops for Kubernetes is to define a job that deploys a pod that downloads data to your EFS.
When it’s done, the real work can start. But that’s for next time.
Conclusion
This was a rather short blog post, but it will be instrumental for what is coming in the upcoming weeks.
What did we learn?
- Deploy a simple dev cluster on AWS in a few commands (nothing new here for veterans!)
- Automate the creation and connection of an EFS volume from the CLI. Now you can script this and create new ones in seconds.
- Consuming EFS from the Canonical Distribution of Kubernetes (and actually from any Kubernetes)
Anything special about this? Yes.
EFS presents itself as a NFS endpoint, hence it remains extremely portable across deployments. CDK on premises uses Ceph as a backend, which also has an NFS option via the Rados GW. You wouldn’t need to change a line of your manifest to move from public cloud to on premises.
EFS does not require IAM roles to get consumed from instances (S3 for example requires either credentials or an IAM role), making it extremely easy to consume. There is one use case especially that is key. It is not possible to add an IAM Role to an existing instance, you can only modify an existing one. EFS bypasses this potential issue and can be used from pre existing infrastructure.
We saw the PV has a size, but we did not set anything in the EFS definition. EFS will scale (about) indefinitely, this limit is a Kubernetes property for the PVs.
That’s it folks. In the final post of this series, we will mix this post, GPUs and start investigating various Deep Learning frameworks on Kubernetes.
Stay Tuned!
Tearing Down
As seen in the previous post, whenever you feel like it, you can tear down this cluster with:
juju kill-controller aws/us-east-1
This will ask for confirmation then destroy everything… But now, you are just a few commands and a coffee away from rebuilding it, and a little script from adding storage again to it. So it’s not a problem, there are solutions!