Learn how to share protected data with user-shared Jupyter Notebooks with a BYOD approach
You have an application (example- Django/python or even a jupyter notebook containing ML code) and it is running inside a Kubernetes pod, ie, it has been containerized using docker and launched within a Kubernetes cluster. (I am assuming you have some idea about the application containerization process.)
For this tutorial, I will be using Google Kubernetes Engine (GKE) on the Google Cloud Platform. You can get a new GCP account with your Gmail account. You will receive a $300 credit to use within the year and can follow along in this tutorial. I am assuming that you know how to launch a Kubernetes cluster using GCP.
Using the GCP developers console, launch the cloud shell to create a cluster using the Google Kubernetes Engine. It looks something like this –
Before we create the Kubernetes cluster, we have to create a google bucket to store the protected data. The google bucket needs to have a unique name for the sidecar container (a docker container with a GCS fuse point) to interact with it.
You can directly use the GUI from the developers’ console to create a bucket, drag-and-drop some test data, and proceed to create the Kubernetes cluster using GKE.
gcloud container clusters create \
- scopes=https://www.googleapis.com/auth/devstorage.read_write \
- scopes=https://www.googleapis.com/auth/cloud-platform \
- machine-type n1-standard-2 \
- num-nodes 2 \
- preemptible \
- zone us-east1-b \ ## your zone
- cluster-version latest \
sidecar-gcsfuse-test ## name of your cluster
Once the cluster is created, add relevant permissions –
kubectl create clusterrolebinding cluster-admin-binding \
- clusterrole=cluster-admin \
Once the cluster is ready, we have to create a persistent volume (PV) and a persistent volume claim (PVC) for the sidecar container to use.
The sidecar is designed with a bi-directional mount point — it connects upstream to the google bucket to get privileged data and downstream to the application to share the data. The application then accesses the data in a non-privileged mode thereby ensuring that it cannot alter the data in any form.
The application also gets its PV and PVC where it stores and modifies the data that it has accessed from the bucket without altering the original data in any way.
To deploy a PV and PVC for the sidecar, we can use an NFS mount or simply a standard GCE disk.
We use a depoyment.yaml to deploy the sidecar. We have to make sure we link to the correct file path, in this case,
'/test/' of the GCS bucket with secure data.
- - -
- name: gcsfuse-test
- mountPath: /data/sidecar-test
command: ["gcsfuse", "-o", "allow_other", "-o", "nonempty", "sidecar-test", "/data/sidecar-test"]
command: ["fusermount", "-u", "/data/sidecar-test"]
- name: sidecar-test
Create a persistent disk for the sidecar. Then deploy the PV, PVC, and the deployment.yaml.
Deploy the sidecar –
gcloud compute disks create --size=200Gi --zone=us-east1-b sidecar-testkubectl apply -f sidecar-pv-pvc-fuse.yamlkubectl apply -f sidecar-deployment.yaml
Once your sidecar container is deployed, test it by making sure you can see it in the list of deployments and pods.
kubectl get deployment
kubectl get pod
Then shell into the sidecar container and attempt writing files and deleting files in the GCS bucket it is associated with.
kubectl exec -it your_gcsfuse-test-deployment -- /bin/bash
The following video shows the GCS FUSE mounted sidecar container writing and deleting secure data in a google bucket.
Thanks for reading! Please feel free to leave a response if you have any comments or feedback.
Next, we will take a look at deploying a jupyter notebook and connecting it with the sidecar container to obtain bucket data in an unprivileged mode.
Part 2 — Deploy a jupyter notebook and connect it with the GCS-FUSE sidecar container.