Bridging the Data Gap: How Eikon Therapeutics is Enhancing Drug Discovery with AI
Image created using Microsoft Design Tool. Credit: Colin Sanford.

Bridging the Data Gap: How Eikon Therapeutics is Enhancing Drug Discovery with AI

** Based on this work presented at the AI in Drug Discovery conference in April 2024.  

At Eikon Therapeutics , we're on a mission to revolutionize drug discovery using the latest advancements in AI. One critical aspect of this process is to predict how well a potential drug can be absorbed in the human body—a measure known as Caco-2 permeability. Making these predictions accurately has always been a challenge. In our recent study, we explored how we can use AI to overcome these hurdles, and the results are promising. 

Understanding Caco-2 Permeability 

Before a drug can be effective, it needs to be absorbed by the intestines and enter the bloodstream. Scientists use Caco-2 cell models to predict this absorption. Think of it as a gatekeeper that tells us which drugs are likely to pass through the intestinal wall. Accurately predicting this permeability can save time and resources in drug development. 


The Challenge 

Traditional methods of predicting Caco-2 permeability rely on large datasets of known compounds. However, these datasets are often limited and don't cover the full range of chemical diversity found in potential new drugs. This is where AI comes into play. By training deep learning models on existing data, we can predict the permeability of new compounds. But here's the catch—these models often struggle when faced with new data that doesn't fit the patterns they were trained on. 

Distribution of the external and internal Caco2 data

Our Approach 

Instead of relying solely on public datasets, we combined them with a small amount of our own internal data. This internal data, although limited, provided a more accurate reflection of the types of compounds we are interested in. 

We used three different AI models for our study: 

  1. Chemprop: This model uses a type of neural network to learn from a large set of known compounds. It's like teaching a student with a vast library of books. 
  2. Chemprop + RDKIt: We enhanced the first model with additional features that help it understand chemical properties better. Imagine giving our student extra study guides. 
  3. ChemBERTa2: This model uses transformers, trained on a massive dataset of chemical structures. It's like having a super-smart student who has read every book in the library. 


The Results 

Initially, our AI models struggled with our internal data because it was quite different from the data they were trained on. However, when we fine-tuned these models with a small portion (just 20%) of our internal data, the results were remarkable. 

  • Chemprop: Performance improved significantly, making it much better at predicting permeability. 

  • Chemprop + RDKit: Showed even greater improvement, becoming our best performer. 

  • ChemBERTa2: Also improved, though not as dramatically as the other two. 

This fine-tuning process is like giving our students a few key lessons from a new textbook—they quickly adapted and performed much better on the test. 

Chemprop model with RDKit gradually improved towards R2=0.7 with a training set containing ~20% internal data.

 

Why It Matters 

These findings highlight the importance of using representative data in training AI models. Even a small amount of relevant internal data can drastically improve the accuracy of predictions. This not only speeds up the drug discovery process but also makes it more reliable, potentially leading to faster development of new, life-saving medications. 


Looking Ahead 

While our study shows significant progress, there's always room for improvement. We'll continue refining our models and exploring new ways to enhance their accuracy. Our goal is to make drug discovery faster, cheaper, and more effective, ultimately bringing better medicines to market sooner. 

At Eikon Therapeutics , we're excited about the applications of AI in Drug Discovery and healthcare in general. Bridging the data gap with AI enhances the accuracy and cost-effectiveness of drug discovery, ultimately leading to better therapeutic outcomes.

Stay connected for more updates on our journey!

Srijit Seal

AFHEA, AMRSC, FCPS, PhD (Cantab), MPhil (Cantab) | Research Associate, Broad Institute of MIT and Harvard | University of Cambridge | Cheminformatics Consultant | Former President of the Graduate Student Body

1mo

The linked work in the beginning is not public, so cant see the details but it would be interesting to see how you split the internal data for fine-tuning and what information it could learn from the out-of-distribution data

To view or add a comment, sign in

Insights from the community

Others also viewed

Explore topics