Building an AI tool: A no-nonsense guide for product teams

Maria Novikova

CRO at Xenoss | Enterprise AI and data engineering | Top 100 software companies on Inc. 5000

Published Feb 29, 2024

Developing AI models used to be highly hardware- and cost-intensive. Yet, following Moore’s Law, deploying machine learning algorithms has become more affordable. Besides, the growth of open source made it easier for early-stage companies to access the latest innovations.

The growing adoption of artificial intelligence is slowly transforming it from a nice-to-have into a standard. In AdTech, AI-enabled ad spending is topping $370 billion, and companies like Jasper are becoming an astounding success.

In this edition of the MadTech Digest, I want to dive deeper into the best practices and workflows tech teams should adopt to flesh out a useful business case for AI and take the solution from concept to implementation.

In this edition of MadTech Digest, I will cover:

Key machine learning use cases
The state of the AI market
Step-by-step guide to deploying AI models

For a deep dive into the roles and responsibilities of an AI engineer, check out the previous edition of MadTech Digest. Subscribe to the newsletter to follow new editions and stay updated with MarTech and AdTech industry news.

How do businesses leverage machine learning?

2023 was the year of generative AI. Following the success of OpenAI with ChatGPT, big tech companies shipped large language models, helping empower an impressive ecosystem of tools in various fields.

With so much spotlight on generative AI, it’s easy to forget about other value-generating subsets of machine learning–computer vision, robotic process automation, predictive analytics, and more.

Here’s a summary of practical ways to use machine learning in projects.

A graph detailing common AI applications across industries — Artificial intelligence capabilities help automate operations, increase security, and create better user or shopper experiences

A helpful mindset shift for finding the right use case for AI is focusing on the problem instead of your business idea or the underlying technology. By deeply understanding the challenge your team is trying to solve, you will bring the project one step closer to product-market fit.

For example, using AI to facilitate cross-department communication and collaboration is a promising application.

Quote from Pankaj Rajan, co-founder of MarkovML, describing the importance of AI in facilitating collaboration — Pankaj Rajan, Co-founder of MarkovML, points out that collaboration tools are a promising AI application

Will the AI market become more competitive?

According to data, this will likely happen. Boston Consulting Group (BCG) research states that 85% of business leaders plan to increase their investments in generative AI and other machine learning technologies. By 2027, companies are expected to spend over $151 billion on machine learning development (eight times more than they did in 2023).

Even though the market grows tighter, there’s still room for innovative copilots or process automation tools.

How to build successful AI systems, from business case to implementation

The awareness of AI’s utility may be growing, but organizations' progress on adoption is far from satisfactory. According to an IBM survey, limited tech expertise, complex data operations, and ethical concerns slow the deployment of innovative features.

Organizations need a solid technical and operational foundation to build an AI tool the market will find worthwhile and support it through the journey of growth and scalability.

Banner covering AI software development services

Based on the experience of Xenoss developers in helping bring AI projects to the market, we would like to share our framework for AI product concept design and implementation.

Step 1: Making a business case for AI

The versatility of AI allows for its use in different ways—understanding which one can generate value will separate successful projects from failed experiments.

One of the most reliable ways to design a case for AI adoption is problem-first thinking. Ask yourself: “What operations in my industry can be automated?” “Where do users spend most time and effort?” “How can AI help address these challenges?”

Once you have shortlisted 5-10 promising ideas, look at the market to see which AI copilot is successful and understand what helped support their growth. Based on the results of market research, develop a list of metrics that would help evaluate the success of your AI use case.

Step 2. Build a dataset and prepare the data for model training

To produce accurate results, AI models rely on training data. The algorithm can create to-the-point predictions if a large volume of information is correctly filtered and structured.

The amount of data a model needs depends on the range of tasks it needs to accomplish and the topics it should be able to navigate. Large-language models like ChatGPT are very data-intensive (they require between 570 GB and 45 TB of data for training). The good news is smaller models need less data to create accurate predictions. A leaner model also has performance and speed benefits, providing users a better experience.

Since GDPR, COPPA, and other privacy legislations limit the freedom of companies to collect data from users, building robust datasets is becoming more complex. Besides, all collected data must be protected by aggregation, anonymization, homomorphic encryption, federated learning, and other privacy-preserving techniques.

To enable machine learning while keeping data collection to a minimum, tech teams leverage data enrichment practices, such as:

Using open-source datasets to train ML models. It’s common for governments (e.g., Data.gov), research institutions (e.g., CERN datasets), and corporations (e.g., Kaggle datasets, Google datasets) to share anonymized datasets tech teams can use to train and fine-tune models.
Joining forces with other businesses in your industry to enrich ML datasets. For example, Mastercard partnered with nine banks, including Hallifax , Lloyds Bank , Bank of Scotland , Monzo Bank , and TSB Bank to improve the accuracy of fraud detection algorithms.
Synthetic datasets. For ML applications that store sensitive data (finance or healthcare), using computer-generated datasets for training models to limit data collection is reasonable. It’s estimated that datasets used by algorithms in these fields will rely on synthetic data for up to 60%.

Having a robust dataset is only half of the problem. Filtering and labeling training data is an equally important challenge. Before applying a dataset to an algorithm, ensure all data items follow a consistent taxonomy and are cataloged and cleansed.

Graph showing key AI development challenges tech teams are facing — 32% of machine learning teams reported struggling with data management when training algorithms

To address data management challenges, tech teams need to focus on building high-performance data pipelines. A robust infrastructure will help teams manage more data in real-time ( The Trade Desk , for example, can process over 800 billion daily queries thanks to a reliable infrastructure built with Amazon Web Services (AWS) and Aerospike

Infographic lists the elements of a data warehousing pipeline — Critical components of a data warehousing pipeline

Step 3. Choose ML techniques for your model

The range of machine learning techniques teams can rely on depends on their chosen approach—supervised or unsupervised learning, deep learning, foundation models, etc.

Each type of machine learning model relies on specific algorithmic techniques that guide the algorithm from raw data to an understandable outcome.

If you want to learn more about the models used for supervised, semi-supervised, unsupervised learning, as well as deep learning and foundational models, check out these sources:

Machine learning models for supervised learning
Techniques used for semi-supervised learning
Algorithms for unsupervised learning
Guide to deep learning models
Review of foundation models

The details of your use case and the amount of available data and engineering resources typically determine the choice between supervised, semisupervised, or unsupervised learning.

Please take a look at the chart below for a quick recap of widely used techniques for training machine learning models.

A table describing subsets of machine learning, associated techniques, and use cases — Types of machine learning strategies and associated techniques

Step 4: Train and validate your models

Training a machine learning model is a pivotal moment that determines the algorithm's accuracy in the long run. In the process, tech teams often face the following challenges:

Training-serving skew: A model is trained under conditions different from those under which it is deployed. As a result, live outcomes can be less accurate compared to training tests.
Data drift: The change in data distribution skews the algorithm's performance. For example, if an ad campaign performance monitoring algorithm has little data about CTV ads, predictions regarding these campaigns will be poorer compared to other channels. Until CTV ads comprise a small share of total ads, the impact of these inaccuracies on the model’s performance is barely noticeable. But, if there’s a surge in CTV ads, the significance of inaccurate predictions will increase, leading to poor overall performance.
Concept drift: The context around the data changed, but the model was not updated. For example, suppose a model is asked to calculate the costs of running ad campaigns in specific regions over time. In that case, trends can change depending on the development level of a given location. The model can no longer accurately predict without a degree of situational awareness.

A graph describing the key types of concept drift — A summary of key concept drift types

How do you train better AI models?

To improve the performance of machine learning models, engineering teams can rely on the following practices:

Hyperparameter tuning, which is training several versions of a model on a different set of parameters. Among these, ML engineers will then choose the most accurate algorithm.

Feature engineering: Analyzing potential use cases for every algorithm feature. SHAP values are a helpful tool for analyzing the value of each feature to the user.
Checking data quality: Scanning for inconsistencies and outliers in datasets and checking the application of the data schema.
Using data validation tools like Pydantic , Great Expectations , and others.

Step 5: Get ready for model deployment

Deploying a machine learning model seems fairly straightforward—yet, at this stage, teams tend to make the most errors. The inability to plan and execute deployment correctly is one of the reasons why only 32% of algorithms make it to the market.

The reasons why ML model deployment fails can be summarized as follows:

Lack of technical infrastructure required to support stable operations
Insufficient technical support
Unclear metrics for determining the ROI of the algorithm
Failure to implement CI/CD principles
Lack of MLOps

A graph showing deployment rates across new, existing, and revolutionary ML initiatives — Deployment rates across ML initiatives are low across new, existing, and revolutionary ML initiatives

Ultimately, successful AI model deployments boil down to having effective processes. Just like DevOps principles of continuous integration (CI) and continuous delivery (CD) improve the deployment of regular software, MLOps increases the speed, efficiency, and predictability of AI model deployments.

Takeaway

Building machine learning models can be summarized in a step-by-step workflow. Its key points are identifying the use case, preparing data, choosing the right algorithm, training, and deploying a model.

Building a data pipeline and following MLOps practices will help increase the speed and accuracy of ML algorithms.

These steps can help teams build a scalable and efficient workflow for developing compelling machine-learning features.

What has been your experience with AI product development? What lessons have you learned in the process?

Xenoss MadTech Digest

860 followers

+ Subscribe

Anuj Sharma

Digital platforms for Healthcare Services | himcos.com

10mo

Maria Novikova AI in ad spending is a game-changer! Can't wait to see its impact in transforming ads and boosting business growth! 🚀

ManyToolsAI

11mo

Talking about generative AI, the boundless potential is surreal. Might I say, it's the golden age of artificial intelligence! OpenAI's success with ChatGPT is one for the ages and I'm eager to see where it goes from here.

2 Reactions

Data & Analytics

11mo

Exciting times ahead in the AdTech and MarTech industries with AI leading the way! Can't wait to see the value-generating products that will revolutionize the sector. 🤖🌟 Maria Novikova

1 Reaction

Frank Howard

Co-Founder & Partner at Margin Ninja |The DFY Organic Growth Engine for Medical Practices | Delivering Sustainable, Long-Term Visibility, Reputation, and Patient Acquisition Without Ad Spend

11mo

Exciting prospects for AI in AdTech and MarTech. Can't wait to explore the possibilities! Maria Novikova

Building an AI tool: A no-nonsense guide for product teams

Maria Novikova

CRO at Xenoss | Enterprise AI and data engineering | Top 100 software companies on Inc. 5000

How do businesses leverage machine learning?

Will the AI market become more competitive?