INTEGRATION OF DEEP LEARNING MODEL WITH CI/CD OPERATIONS USING JENKINS, DOCKER TO AUTOMATE THE COMPLETE DELIVERY AND DEPLOYMENT OF DEEP LEARNING MODEL

NIKHIL G R

Serving Notice Period, Cloud Data Engineer at TCS, 2x Microsoft Azure Cloud Certified, Python, Pyspark, Azure Databricks, ADLs, Azure Synapse, Azure Data factory, MySQL, Lake House, Delta Lake, Data Enthusiast

Published Jun 17, 2020

INTEGRATION OF DEEP LEARNING WITH DEV-OPS

According to wikipedia, MLOps (a compound of “machine learning” and “operations”) is a practice for collaboration and communication between data scientists and operations professionals to help manage production ML (or deep learning) lifecycle.

Many of the failures in the Data Science is due to the failure of DEPLOYMENT OF THE MODEL into to the Real World Scenerio.

Another problem that is faced by Data Scientists is we cannot change the hyper-parameters again and again so that the accuracy of the model get increased. Its a manual thing which is very tedious since a Deep Learning model takes a lot of time to get trained.

So this article is all about automating the deep learning with the dev-ops world so that there won't be any manual thing for a person so that the machine automatically chooses the hyper-parameter while checking the accuracy.

Let me give a brief description about the task that I have completed in this article. So this is the problem statement that has been solved in this article.

1. Create a container image that has Python3 and Keras or numpy installed using dockerfile. This contains a container just like an environment for the production of the deep learning model.

2. When we launch this image, it should automatically starts to train the model in the container.

3. Create a job chain of job1, job2, job3, job4 and job5 using build pipeline plugin in Jenkins

4. Job1 : Pull the Github repository automatically when some developers push repository to Github.

5. Job2 : By looking at the code or program file, Jenkins should automatically start the respective machine learning software installed, interpreter installed image container to deploy the code and start training( eg. If code uses CNN, then Jenkins should start the container that has already installed all the softwares required for the CNN processing).

6. Job3 : Train the model and predict accuracy or metrics.

7. Job4 : If the metrics accuracy is less than 80% , then tweak the machine learning model architecture.

8. Job5: Retrain the model or if the accuracy is ggod enough notify that the best model is being created.

Before starting, I have created a Docker Image in which I have configured miniconda which provides almost all the software and packages for implementing Deep Learning model. This docker image creates a own environment where the model can be trained. You can see see my Docker image on Docker Hub.

You can also pull this image by simply typing:

docker pull nikhilgubbi/centos-miniconda3

The docker file looks something like this.

Job1 - Pulling the code from GitHub to Rhel8 Machine using PoleSCM Triggering.

From base Redhat8 VM Machine, I have configured the directory with git and automatically it push the code using the hooks concept in git.

Whenever we commit any of the file, the code will be pushed to the GitHub Repository automatically. Here, we have created a CNN model using MNIST dataset to predict the Number dataset.

All the code is being provided in the Github repository.

Lets Look at the Configuration of the Job1.

In the end, after downloading the code, this Job will copy all the files to a specified location. We have to use the following commad in the jenkins build execute shell:

sudo cp * /root/deploy-dl-code

[ Directory where my code will be copied]

After the succesfull completion of Job1, Job2 will be triggered automatically.

Job2 - Building the Container (Environment) Specified for the Model Code and its Requirements.

In Job2, I have automated the process of building the container from miniconda docker image. Beside building the container with all software, I have made a requirements.txt file in which I have specified all the required packages for training our model. The requirements.txt file is also copied from GitHub to the target location. Lets see how Job2 works.

Here Job2 is the Upstream of Job1 and Downstream of Job3.

The Complete Shell Script is Described as given below:

echo

    sleep 2s

    echo "++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++"

    echo "             pulling image               "

    echo "++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++"

    sudo docker pull nikhilgubbi/centos-miniconda3:8

    sleep 4s

    echo

fi

sleep 2s

if sudo docker ps -a | grep deploy-dl-code/MLOPS-AUTOMATION

then

    echo

    sleep 2s

    echo "+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++"

    echo "      Starting Exisiting Container deploy-dl-code       "

    echo "+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++"

    sudo docker start deploy-dl-code/MLOPS-AUTOMATION

    sleep 2s

    echo

else

    sleep 2s

    echo "++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++"

    echo "         Building Container deploy-dl-code         "

    echo "++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++"

    sudo docker run -dit --name deploy-dl-code -v /root/deploy-dl-code/MLOPS-AUTOMATION/:/dlcode nikhilgubbi/centos-miniconda3:8

    sleep 5s

    echo

fi

echo "++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++"

echo "    Instaling all required packages from requirements.txt    "

echo "++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++"

echo

sleep 2s

sudo docker exec deploy-dl-code /root/miniconda3/condabin/conda install -n tensorflow --yes --file /dlcode/requirements.txt

The Above Job is designed in such a way that:

1. If the docker Image is not present in the system, it will download the image automatically from my Docker Repo. If it is already existed, it will start to build the container automatically.

2. If the Container is not present, it will launch a container with the name deploy-dl-code and also mount the volume where our code is present inside the container. In any case, if the Container is already present, then it won't create it again. If it is Stopped, it will start the Container Automatically.

3. After the Container is get built, the last Command will install all the required software inside the container in tensorflow environment by default.

After this Completion, Job3 will be triggered automatically.

Job3 - Training our model inside the container.

In Job3, Jenkins will start to train the the model inside the Container.

Here, Job3 is the Upstream of Job2 and Downstream of Job4.

The shell Command to make the container to start the training of the model cnn.py is:

sudo docker exec deploy-dl-code /root/miniconda3/envs/tensorflow/bin/python3 /dlcode/cnn.py

[ Path of my code]

After the training of the model, it will store the accuracy results in cnn_resultbestaccuracy.txt file.

After its completion, Job4 will be triggered automatically.

Job4 - Tweaking the Model again to get the highest Accuracy.

In Job4, I have made a logic in such a way that by which we can automatically improve the accuracy of our cnn model without changing the code manually. It took me around 6 hours to design this logic which is the most interesting part of this article and this is where the use case for a real world problem has been solved.

In Job3, after training the model successfully, it will save the accuracy result in cnn_resultbestaccuracy.txt file. In the first go, if the model gives accuracy less than 90%, then Job4 will run another file called filehandling.py which adds a Dense Layer inside the main code. After this, the Shell script invokes the Job3 again, to again start building the model. For invoking Job3 again from shell, here I have used the concept of Trigger builds remotely with Authentication Token.

In simple words, Job4 will work something like this:

a. If the model accuracy is less than 90%, then shell script will run the filehandling.py file and then invoke the Job3 to train the model again. This process works continuously until we get the accuracy more than 90%.

b. After getting the accuracy more than 90%, it will come out of the loop and stops the process of tweaking. Here is the Shell Script command.

train=$(sudo cat /root/deploy-dl-code/test.txt)

pred=90.000000

st=`echo "$train < $pred" | bc`

if [ $st -eq 1 ]; then

 ## If accuracy is not the desired accuracy

 echo "Tweaking Model again by triggering Job3"

 sudo docker exec deploy-dl-code /root/miniconda3/envs/tensorflow/bin/python3 /dlcode/filehandling.py

 curl -X POST http://192.168.99.111:8080/job/Job3-Train_Model/build?token=job3 --user nikhilgubbi:11cecf8d41bce413ad35249c815a28e2a8

else

 ## If accuracy is greater than the desired one

 echo "Model Successfully tweaked and your Accuracy is improved"

fi

After its Completion, Job5 will automatically get triggered.

Job5 - Display Best Accuracy.

Finally in Job5, Jenkins will display the Accuracy result. Here for this, I have used a jenkins plugin called Summary Display which displays the Summary of the complete Job.

The command for Shell Script looks something like this:

sudo docker exec deploy-dl-code cat /dlcode/cnn_resultbestaccuracy.txt

So guys, this is the final step of my Article which is a Fully Automated DevOps Operation of CI/CD Pipeline with the Deep Learning Code.It took me around 17 hours to complete this project to write an article on it.

Thank you guys for your free time for reading this article...

INTEGRATION OF DEEP LEARNING MODEL WITH CI/CD OPERATIONS USING JENKINS, DOCKER TO AUTOMATE THE COMPLETE DELIVERY AND DEPLOYMENT OF DEEP LEARNING MODEL

NIKHIL G R

Serving Notice Period, Cloud Data Engineer at TCS, 2x Microsoft Azure Cloud Certified, Python, Pyspark, Azure Databricks, ADLs, Azure Synapse, Azure Data factory, MySQL, Lake House, Delta Lake, Data Enthusiast

INTEGRATION OF DEEP LEARNING WITH DEV-OPS

More articles by this author

Insights from the community

Others also viewed

Machine Learning Fundamentals: An Introduction To Algorithms

Machine Learning Fundamentals: An Introduction To Algorithms

Applications of Deep Learning in Big Data analytics

Machine learning operations: How to enhance AI projects and boost value streams

CRISP-DM Process for Machine Learning Projects

Fine-Tuning LLaMA 2 with Amazon SageMaker JumpStart

Supervised Machine Learning

2024ರಲ್ಲಿ Beginnersಗಾಗಿ AI/ML Engineer ಆಗುವ Learning Roadmap

LLM Operations in Azure

Deploying Machine Learning Models

Explore topics

INTEGRATION OF DEEP LEARNING WITH DEV-OPS

Introduction to DBT (Data Build Tool)

May 20, 2024

DIFFERENCES IN SQL

Jan 8, 2024

Introduction to Azure Databricks (Part 2)

Dec 6, 2023

Introduction to Azure Databricks (Part 1)

Dec 5, 2023

Aggregate and Window Functions in Pyspark

Dec 4, 2023

Different ways of creating a Dataframe in Pyspark

Nov 24, 2023

Dataframes and Spark SQL Table

Nov 23, 2023

Dataframe Reader API

Nov 22, 2023

repartition vs coalesce in pyspark

Nov 21, 2023

Apache Spark on YARN Architecture

Nov 16, 2023

Insights from the community

Others also viewed

Machine Learning Fundamentals: An Introduction To Algorithms

Machine Learning Fundamentals: An Introduction To Algorithms

Applications of Deep Learning in Big Data analytics

Machine learning operations: How to enhance AI projects and boost value streams

CRISP-DM Process for Machine Learning Projects

Fine-Tuning LLaMA 2 with Amazon SageMaker JumpStart

Supervised Machine Learning

2024ರಲ್ಲಿ Beginnersಗಾಗಿ AI/ML Engineer ಆಗುವ Learning Roadmap

LLM Operations in Azure

Deploying Machine Learning Models

Explore topics