INTEGRATION OF DEEP LEARNING MODEL WITH CI/CD OPERATIONS USING JENKINS, DOCKER TO AUTOMATE THE COMPLETE DELIVERY AND DEPLOYMENT OF DEEP LEARNING MODEL
INTEGRATION OF DEEP LEARNING WITH DEV-OPS
According to wikipedia, MLOps (a compound of “machine learning” and “operations”) is a practice for collaboration and communication between data scientists and operations professionals to help manage production ML (or deep learning) lifecycle.
Many of the failures in the Data Science is due to the failure of DEPLOYMENT OF THE MODEL into to the Real World Scenerio.
Another problem that is faced by Data Scientists is we cannot change the hyper-parameters again and again so that the accuracy of the model get increased. Its a manual thing which is very tedious since a Deep Learning model takes a lot of time to get trained.
So this article is all about automating the deep learning with the dev-ops world so that there won't be any manual thing for a person so that the machine automatically chooses the hyper-parameter while checking the accuracy.
Let me give a brief description about the task that I have completed in this article. So this is the problem statement that has been solved in this article.
1. Create a container image that has Python3 and Keras or numpy installed using dockerfile. This contains a container just like an environment for the production of the deep learning model.
2. When we launch this image, it should automatically starts to train the model in the container.
3. Create a job chain of job1, job2, job3, job4 and job5 using build pipeline plugin in Jenkins
4. Job1 : Pull the Github repository automatically when some developers push repository to Github.
5. Job2 : By looking at the code or program file, Jenkins should automatically start the respective machine learning software installed, interpreter installed image container to deploy the code and start training( eg. If code uses CNN, then Jenkins should start the container that has already installed all the softwares required for the CNN processing).
6. Job3 : Train the model and predict accuracy or metrics.
7. Job4 : If the metrics accuracy is less than 80% , then tweak the machine learning model architecture.
8. Job5: Retrain the model or if the accuracy is ggod enough notify that the best model is being created.
Before starting, I have created a Docker Image in which I have configured miniconda which provides almost all the software and packages for implementing Deep Learning model. This docker image creates a own environment where the model can be trained. You can see see my Docker image on Docker Hub.
You can also pull this image by simply typing:
docker pull nikhilgubbi/centos-miniconda3
The docker file looks something like this.
Job1 - Pulling the code from GitHub to Rhel8 Machine using PoleSCM Triggering.
From base Redhat8 VM Machine, I have configured the directory with git and automatically it push the code using the hooks concept in git.
Whenever we commit any of the file, the code will be pushed to the GitHub Repository automatically. Here, we have created a CNN model using MNIST dataset to predict the Number dataset.
All the code is being provided in the Github repository.
Lets Look at the Configuration of the Job1.
In the end, after downloading the code, this Job will copy all the files to a specified location. We have to use the following commad in the jenkins build execute shell:
sudo cp * /root/deploy-dl-code
[ Directory where my code will be copied]
After the succesfull completion of Job1, Job2 will be triggered automatically.
Job2 - Building the Container (Environment) Specified for the Model Code and its Requirements.
In Job2, I have automated the process of building the container from miniconda docker image. Beside building the container with all software, I have made a requirements.txt file in which I have specified all the required packages for training our model. The requirements.txt file is also copied from GitHub to the target location. Lets see how Job2 works.
Here Job2 is the Upstream of Job1 and Downstream of Job3.
The Complete Shell Script is Described as given below:
echo sleep 2s echo "++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++" echo " pulling image " echo "++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++" sudo docker pull nikhilgubbi/centos-miniconda3:8 sleep 4s echo fi sleep 2s if sudo docker ps -a | grep deploy-dl-code/MLOPS-AUTOMATION then echo sleep 2s echo "+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++" echo " Starting Exisiting Container deploy-dl-code " echo "+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++" sudo docker start deploy-dl-code/MLOPS-AUTOMATION sleep 2s echo else sleep 2s echo "++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++" echo " Building Container deploy-dl-code " echo "++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++" sudo docker run -dit --name deploy-dl-code -v /root/deploy-dl-code/MLOPS-AUTOMATION/:/dlcode nikhilgubbi/centos-miniconda3:8 sleep 5s echo fi echo "++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++" echo " Instaling all required packages from requirements.txt " echo "++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++" echo
sleep 2s
sudo docker exec deploy-dl-code /root/miniconda3/condabin/conda install -n tensorflow --yes --file /dlcode/requirements.txt
The Above Job is designed in such a way that:
1. If the docker Image is not present in the system, it will download the image automatically from my Docker Repo. If it is already existed, it will start to build the container automatically.
2. If the Container is not present, it will launch a container with the name deploy-dl-code and also mount the volume where our code is present inside the container. In any case, if the Container is already present, then it won't create it again. If it is Stopped, it will start the Container Automatically.
3. After the Container is get built, the last Command will install all the required software inside the container in tensorflow environment by default.
After this Completion, Job3 will be triggered automatically.
Job3 - Training our model inside the container.
In Job3, Jenkins will start to train the the model inside the Container.
Here, Job3 is the Upstream of Job2 and Downstream of Job4.
The shell Command to make the container to start the training of the model cnn.py is:
sudo docker exec deploy-dl-code /root/miniconda3/envs/tensorflow/bin/python3 /dlcode/cnn.py
[ Path of my code]
After the training of the model, it will store the accuracy results in cnn_resultbestaccuracy.txt file.
After its completion, Job4 will be triggered automatically.
Job4 - Tweaking the Model again to get the highest Accuracy.
In Job4, I have made a logic in such a way that by which we can automatically improve the accuracy of our cnn model without changing the code manually. It took me around 6 hours to design this logic which is the most interesting part of this article and this is where the use case for a real world problem has been solved.
In Job3, after training the model successfully, it will save the accuracy result in cnn_resultbestaccuracy.txt file. In the first go, if the model gives accuracy less than 90%, then Job4 will run another file called filehandling.py which adds a Dense Layer inside the main code. After this, the Shell script invokes the Job3 again, to again start building the model. For invoking Job3 again from shell, here I have used the concept of Trigger builds remotely with Authentication Token.
In simple words, Job4 will work something like this:
a. If the model accuracy is less than 90%, then shell script will run the filehandling.py file and then invoke the Job3 to train the model again. This process works continuously until we get the accuracy more than 90%.
b. After getting the accuracy more than 90%, it will come out of the loop and stops the process of tweaking. Here is the Shell Script command.
train=$(sudo cat /root/deploy-dl-code/test.txt) pred=90.000000 st=`echo "$train < $pred" | bc` if [ $st -eq 1 ]; then ## If accuracy is not the desired accuracy echo "Tweaking Model again by triggering Job3" sudo docker exec deploy-dl-code /root/miniconda3/envs/tensorflow/bin/python3 /dlcode/filehandling.py curl -X POST http://192.168.99.111:8080/job/Job3-Train_Model/build?token=job3 --user nikhilgubbi:11cecf8d41bce413ad35249c815a28e2a8 else ## If accuracy is greater than the desired one echo "Model Successfully tweaked and your Accuracy is improved"
fi
After its Completion, Job5 will automatically get triggered.
Job5 - Display Best Accuracy.
Finally in Job5, Jenkins will display the Accuracy result. Here for this, I have used a jenkins plugin called Summary Display which displays the Summary of the complete Job.
The command for Shell Script looks something like this:
sudo docker exec deploy-dl-code cat /dlcode/cnn_resultbestaccuracy.txt
So guys, this is the final step of my Article which is a Fully Automated DevOps Operation of CI/CD Pipeline with the Deep Learning Code.It took me around 17 hours to complete this project to write an article on it.
Thank you guys for your free time for reading this article...