Silicon UK In Focus EXTRA: Does More Data Equal Better Health?
Rabia has a background in immuno-genetics with a passion for bridging the worlds of science, business and technology. She completed her BSc in Biology and Economics from McGill University in Montreal, Canada. She went on to complete a PhD in Genetics, studying the genetics underlying the host immune response to Typhoid.
Dr Rabia Khan, MD Discovery Sciences at Sensyne Health.
Is the healthcare sector entering a new era where personal data becomes the basis of treatments and preventative care?
Data is increasingly powering healthcare, whether that is health care records (real-world data) or real-time captured data from wearables, or remote monitoring solutions. The rapid advancement of analytical tools, such as machine learning and Big Data analytics, means that this explosion in healthcare data can be used to impact disease management, from prevention, management of care and personalized medicine.
Within healthcare systems, real-world data can be used to support clinicians in decision making, and power multi-variable decision-making algorithms, that can help better understand complex diseases. The use of big data and machine learning has helped us better understand the new diseases humanity is faced with, such as COVID-19.
Data is also pivotal to power the industry charged with curing disease. Real-world data is crucial in putting patients at the heart of the drug discovery and development process. Due to the availability of patient data, we can truly move from “bench to bedside” and move towards a more patient-centric “bedside to bench” model of drug discovery.
Data is allowing us to redefine disease taxonomies, eventually moving away from a century-old ICD-10 code classification of disease to a mechanistic understanding of the disease, that is more aligned with the understanding pathology of disease rather than the signs and symptoms of the disease.
Using data, both from real-world data sets, multi-omics and large patient biobank, we can redefine disease taxonomy and are finally at a pivotal moment in history where we can better understand the underlying biology rather than targeting the symptoms.
An example is endometriosis; we do not understand the causes of a disease that affect 10% of women, with no current treatment options available. The plethora of data will now allow us to not only better understand the condition, but also monitor and track disease over time using digital remote monitoring tools.
Longitudinal data across healthcare services are often fragmented. How can technology help us curate and integrate that data into meaningful datasets?
Today healthcare professionals are limited in the level of care they can deliver to patients due to the inability to see accurate and complete health records. EPRs (electronic patient records) are spread across many fragmented sources of data with multiple gatekeepers — primary care, secondary care, social care, occupational health at employers, specialist care centres, etc. — which all have their own data for the patient.
These different sources of data may be incomplete or overlapping. Despite this fragmented data, all these stakeholders share a common purpose, to help treat, cure and manage the patient. In a new data-driven world, the integration of these data sets to improve patient outcomes is pivotal, as the integrated data set provides a holistic picture of a patient journey, rather than the single time point – single stakeholder datapoints held by each stakeholder in the patient pathway.
Data “wrangling” and “cleaning”, often the less discussed and most pivotal part of data analytics is key in healthcare data sets. To link, aggregate and make longitudinal data available for use for AI and clinical decision making, the linking, aggregation and storage have to be done in an ethical, and clinically relevant manner, preserving patient privacy while ensuring clinical accuracy.
To do this effectively, we must build interdisciplinary teams, that understand the relevance and importance of effective data wrangling, understand data-dictionaries and standards, clinical expertise and privacy experts. Technology has helped inform every step of this process. It has enabled us to standardize, harmonize and store data sets to make them amenable for analysis, consortium have been established that are defining EPR data – linking standards and a large number of companies are focused on privacy-preserving algorithms and anonymization tools to ensure the ethical use of patient data. Although a new, rapidly developing ecosystem of diverse companies, technology has enabled us to transform the linking and analysis of data sets sat across disparate databases to impact patient care.
Do we have the analytical capability to extract meaningful data from each patient dataset without bias or discrimination?
As AI continues to advance and the boundaries in which it operates expands, we will naturally begin to face challenges around bias. It’s an issue because fundamentally, AI systems partly require training from humans. So it can be difficult to 100% eliminate any bias that will exist in the data sets; we are using to train the algorithms.
We should make a concerted effort to ensure the data sets used to train AI algorithms adequately cover global diversity. Inherently, this has challenges, as most data sets available for research are biased. As a sector, we must take active steps to reverse this bias and include diversity across the board from training data sets in algorithms to patient populations studied in drug discovery, acknowledging that this has been a challenge for the industry historically.
Healthcare organizations can collect masses of data, but is the real challenge of how this information is interpreted? Is this where AI will deliver the tools need to use the data available to provide real patient benefit?
We are no longer able to process the vast amount of complex, longitudinal and heterogeneous health data available to us, to find necessary trends and insights, without the assistance of advanced analytical tools such as AI.
By analyzing various forms of data such a real-world patient data, phenotypic and genetic data, with cheaper processing power and machine learning, we can better understand the underlying biology. This was fundamentally not possible previously and has been enabled with the convergence of more affordable computing power (Cloud), advanced analytics (AI/ML), availability of advanced biological techniques (stem cells, scRNASeq), reducing costs of genome sequencing and digital tools to assist with data capture.
The availability of data combined with advanced analytical tools now allows us to question our classical, and perhaps outdated approaches to understanding the biology and disease manifestations. For example, Inflammatory Bowel Disease (IBD) or heart failure, which are currently classified as a disease, are, in-fact, syndromes made up of multiple pathologies. By classifying them as one disease and treating all patients with those symptoms with one treatment, we are mismanaging and not providing the best care possible for these patients who are suffering.
If instead, we consider them to be a spectrum of disorders and use advanced analytical tools applied to patient outcomes captured in EPR, we can model the patient outcomes, like the response to treatment, rather the symptoms and understand which biomarkers classify a patient into which patient trajectory. This allows us to better care for our patients, but also focus on future drug discovery and development on the patients who are not responding to current lines of therapy.
EPR provides an excellent example of the power of data-driven healthcare. The information is complex and heterogeneous (blood work, images, outcomes, vitals, labs, medications, co-morbidities and demographics), making it well-suited to analysis with sophisticated AI methods. AI/ML requires large complex data sets to showcase their effectiveness over other methods.
By using machine learning applied to EPR data, it is possible to identify signatures of patients with the same disease but are classified as responders to a particular drug versus non-responders, thereby breaking a singular disease into subtypes. These approaches allow us to integrate vast data sets and sub-divide groups of patients with similar disease characteristics such as IBD, and redefine IBD into a new taxonomy.
As FinTech businesses are transforming financial services, will we see a raft of new start-ups to create a HealthTech sector driven by data and analytical technologies?
This is already happening, with the first AI-developed drugs now going into clinical trials and AI algorithms being used to support radiology workflows.
We would welcome collaboration between the NHS, big corporates such as pharmaceutical companies and health tech start-ups and scaleups, to accelerate the discovery of drugs and prevention of disease. For example, Sensyne Health recently collaborated with Bayer to identify patient populations in one disease area, by assessing and collating anonymized patient data with anonymized EPR information to better understand of the disease heterogeneity within heart failure. This would not have been possible without the power of technology supporting clinicians analyzing the real-world evidence, only made available through a positive relationship with NHS trusts.
Sensyne also collaborates with NHS Trusts under Strategic Research Agreements to combine clinical artificial intelligence technology with ethically sourced, anonymized patient data to help improve patient care, accelerate the discovery and development of new medicines and enhance our understanding of disease and treatment. This two-way collaborating approach is essential for creating value from the analysis of anonymized patient data in an ethical way.
Where do you think the next major innovation in the health tech sector will come from?
My vision for the near future is to have large scale sharable banks of patient samples, linked to their electronic patient records and outcomes to truly change the paradigm of “bench to bedside”, to patient-centric drug discovery, where patient data, patient outcomes and patient samples are pivotal in every step of the drug discovery process. Along with medical records, data from remote monitoring apps can supplement EPR data sets to understand chronic diseases and their management in the home. Aggregated, this information will genuinely change the way we understand disease biology, define the disease and improve patient care.
I believe that the drug discovery process we know today will completely transform thanks to access to high-quality patient data. A drug discovery program will start with medical record data to define the target product profile, samples from patients will be used to run the entire drug discovery and development, moving away from a linear process of drug discovery and development from Targets ID to Clinical trials, rather an iterative process. It is akin to how agile ways of working transformed the technology sector, working with patient data and patient samples, and iterating the entire drug discovery and development process to be more iterative, would allow us to fail faster and reduce the cost of innovating in healthcare
Most critically, I hope that we will no longer need randomized control trials as we run them today. Imagine, instead of people being subjected to the lengthy and costly drug creation process, we have simulated in silico trials where the impact of compounds are modelled in virtual patients without needing to expose patients unnecessarily to drugs or placebos – only enrolling people who are predicted to respond to the drug, and in theory, accelerating the whole process of bringing the drug to market.