Medical data: the key to unlocking the true potential of AI in healthcare

It is well documented that the lack of representation in biomedical research is leading to a data gap that can no longer be overlooked if we are to avoid exacerbating existing health inequalities in the age of digital health.

Advances in machine learning (ML) techniques are allowing the scientific community to unlock the potential of biomedical data and extract valuable insights. Yet amidst the hope sits a certain uncomfortable reality: not everyone is set to benefit from these advances. At the heart of innovation in healthcare lies the datasets used to train the algorithms, such as data from scientific literature, clinical trials, omics, and patient real world data. These datasets are the lifeblood of new technologies, and are extremely meaningful. Yet, they have significant shortcomings, since the majority of medical research is conducted on white and predominantly male populations of European descent. This lack of diversity in data has serious consequences for medical care, as the products discovered through use of this data may not benefit everyone.

As a data-driven artificial intelligence company, BenevolentAI recognises the value of having access to diverse datasets and has launched the Data Diversity Initiative (DDI) to address the challenge, raise awareness and work with stakeholders in the wider scientific and research community on possible solutions. At our inaugural DDI event in November 2019, our panel of experts discussed why we must drastically improve the way we design, collect and process biomedical research to ensure that health data fulfills its true potential.

Firstly, the panel discussed the need to ask the right questions when designing research studies, in particular: does the study represent the patient population accurately? The rich diversity of the UK’s NHS clinical population represents huge untapped potential for increasing diversity in data, however, Jackie Hunter of BenevolentAI (CE, Clinical and Strategic Partnerships) argues that this is often not fully embraced, especially in clinical trials.

When it comes to collecting diverse data, Dawn Duhaney, Product Manager at the Wellcome Trust, highlighted the need to build trust and increase transparency in the research recruitment process. People from diverse backgrounds may not trust medical or research institutions enough to engage with them, so work needs to be done around engaging with underrepresented groups and communicating how companies or institutions plan to use their data, and for what purpose.

This leads to another critical challenge: health data’s poor interoperability. Clinical medical records on diverse populations exist to a certain extent, however this data is held in siloed repositories such as electronic medical records, laboratory and imaging systems and physician notes. In fact, the World Health Organisation estimates that less than 20% of medical data is available in a form that ML and AI can ingest and learn from. The highly fragmented nature of medical data makes it very difficult and time consuming to access, share or combine. Improving the way we process data by improving data interoperability would more easily allow for health technology innovators to merge clinical medical records, thereby enabling access to a more diverse data pool.

Some countries have seen more coordinated efforts to enable data accessibility. As Diane Harbison, CEO of Decipher Analytics explains, in Scotland electronic health records have been used for longer, enabling the access to more longitudinal patient data since the Scotland healthcare board uses similar systems to track patients across the country. These coordinated efforts make it easier to access high quality datasets and maximise the chance to produce research that will have a positive impact on patient health. While Scotland does not have the most diverse patient population, this is a model which could be imitated elsewhere in the world.

We must be cognisant of the uncomfortable reality surrounding the application of AI in healthcare: if we feed new technologies imbalanced datasets, we will create an imbalance in those who benefit from tech-powered innovation. To ensure transferability and equality of medical treatment, we must improve the way we design, collect and process biomedical research.

More Posts

You Might Also Like

Intern at BenevolentAI part I: meet our 2020 intern cohort
What impactful work did our interns get up to across Engineering, Data Science, ML and business operations this summer? Get to know them and their work in our tech internships blog.
Nov 26, 2020
FDA grants Emergency Use Authorisation for baricitinib in hospitalised COVID-19 patients nine months after initial hypothesis was published by BenevolentAI
BenevolentAI scientists first identified baricitinib as a potential treatment for COVID-19 in early February 2020 using Benevolent's AI tools and biomedical knowledge graph.
Nov 20, 2020
BenevolentAI at NeurIPS 2020: Machine Learning in Drug Discovery
BenevolentAI is happy to announce it is sponsoring NeurIPS 2020. Join us to hear about data diversity and ML applied drug discovery, and to learn about careers in the field.
Nov 17, 2020
Careers with Impact: 5 learnings from machine learning applied drug discovery
Last week, we brought together four of our exceptional colleagues for a panel discussion on careers in machine learning applied drug discovery. Here are some of our main takeaways:
Nov 17, 2020
Data published in Science Advances shows baricitinib reduces COVID-19 morbidity and mortality
Research published in Science Advances supports BenevolentAI’s AI-generated hypothesis from late January for baricitinib as a treatment for COVID-19.
Nov 13, 2020
Sir Nigel Shadbolt joins BenevolentAI as a non-executive director
BenevolentAI strengthens its Board with the appointment of AI pioneer Sir Nigel Shadbolt as Non-Executive Director.
Nov 3, 2020