Bridging the Data Gap: How Eikon Therapeutics is Enhancing Drug Discovery with AI

Adi Hanuka, PhD

Associate Director @EikonTX | Physics<>ML | Ex-SLAC/Stanford | Forbes 30 Under 30

Published Sep 25, 2024

** Based on this work presented at the AI in Drug Discovery conference in April 2024.

At Eikon Therapeutics , we're on a mission to revolutionize drug discovery using the latest advancements in AI. One critical aspect of this process is to predict how well a potential drug can be absorbed in the human body—a measure known as Caco-2 permeability. Making these predictions accurately has always been a challenge. In our recent study, we explored how we can use AI to overcome these hurdles, and the results are promising.

Understanding Caco-2 Permeability

Before a drug can be effective, it needs to be absorbed by the intestines and enter the bloodstream. Scientists use Caco-2 cell models to predict this absorption. Think of it as a gatekeeper that tells us which drugs are likely to pass through the intestinal wall. Accurately predicting this permeability can save time and resources in drug development.

The Challenge

Traditional methods of predicting Caco-2 permeability rely on large datasets of known compounds. However, these datasets are often limited and don't cover the full range of chemical diversity found in potential new drugs. This is where AI comes into play. By training deep learning models on existing data, we can predict the permeability of new compounds. But here's the catch—these models often struggle when faced with new data that doesn't fit the patterns they were trained on.

Distribution of the external and internal Caco2 data

Our Approach

Instead of relying solely on public datasets, we combined them with a small amount of our own internal data. This internal data, although limited, provided a more accurate reflection of the types of compounds we are interested in.

We used three different AI models for our study:

Chemprop: This model uses a type of neural network to learn from a large set of known compounds. It's like teaching a student with a vast library of books.
Chemprop + RDKIt: We enhanced the first model with additional features that help it understand chemical properties better. Imagine giving our student extra study guides.
ChemBERTa2: This model uses transformers, trained on a massive dataset of chemical structures. It's like having a super-smart student who has read every book in the library.

Recommended by LinkedIn

Breakthrough: An AI-Designed Drug is now Entering…

Michael Spencer 1 year ago

Drug Discovery In The Age of AI

Bertalan Meskó, MD, PhD 1 month ago

revolutionizing drug discovery and delivery

Peter H. Diamandis 4 years ago

The Results

Initially, our AI models struggled with our internal data because it was quite different from the data they were trained on. However, when we fine-tuned these models with a small portion (just 20%) of our internal data, the results were remarkable.

Chemprop: Performance improved significantly, making it much better at predicting permeability.

Chemprop + RDKit: Showed even greater improvement, becoming our best performer.

ChemBERTa2: Also improved, though not as dramatically as the other two.

This fine-tuning process is like giving our students a few key lessons from a new textbook—they quickly adapted and performed much better on the test.

Chemprop model with RDKit gradually improved towards R2=0.7 with a training set containing ~20% internal data.

Why It Matters

These findings highlight the importance of using representative data in training AI models. Even a small amount of relevant internal data can drastically improve the accuracy of predictions. This not only speeds up the drug discovery process but also makes it more reliable, potentially leading to faster development of new, life-saving medications.

Looking Ahead

While our study shows significant progress, there's always room for improvement. We'll continue refining our models and exploring new ways to enhance their accuracy. Our goal is to make drug discovery faster, cheaper, and more effective, ultimately bringing better medicines to market sooner.

At Eikon Therapeutics , we're excited about the applications of AI in Drug Discovery and healthcare in general. Bridging the data gap with AI enhances the accuracy and cost-effectiveness of drug discovery, ultimately leading to better therapeutic outcomes.

Stay connected for more updates on our journey!

Srijit Seal

AFHEA, AMRSC, FCPS, PhD (Cantab), MPhil (Cantab) | Research Associate, Broad Institute of MIT and Harvard | University of Cambridge | Cheminformatics Consultant | Former President of the Graduate Student Body

1mo

The linked work in the beginning is not public, so cant see the details but it would be interesting to see how you split the internal data for fine-tuning and what information it could learn from the out-of-distribution data

1 Reaction

See more comments

Bridging the Data Gap: How Eikon Therapeutics is Enhancing Drug Discovery with AI

Adi Hanuka, PhD

Associate Director @EikonTX | Physics<>ML | Ex-SLAC/Stanford | Forbes 30 Under 30

Understanding Caco-2 Permeability

The Challenge

Our Approach

Recommended by LinkedIn

The Results

Why It Matters

Looking Ahead

Insights from the community

Others also viewed

Using AI to accelerate drug design and synthesis planning

revolutionizing drug discovery & delivery

From Logic to Ligands: Harnessing GenAI's Reasoning Capabilities and Real-Time Analysis in Drug Discovery, Medicinal Chemistry & Spectrometric Studies

Understanding the model life cycle in drug discovery

Exploring the Uncharted: Machine Learning in Dark Pharmacology

Revolutionising Drug Discovery: How Generative AI is Shaping the Future of Pharmaceuticals

AMR Future Brief| Revolutionizing Drug Discovery: The Transformative Role of AI and Machine Learning

Supercomputers and AI at the service of drug discovery

Evolution and Potential of Generative AI in Life Sciences

The Role of Artificial Intelligence in Accelerating the Pharma Clock: Revolutionizing Drug Discovery and Development

Explore topics