03 Mar 2020

Data diversity

It is well documented that the lack of representation in biomedical research is leading to a data gap that can no longer be overlooked if we are to avoid exacerbating existing health inequalities in the age of digital health.

Advances in machine learning (ML) techniques are allowing the scientific community to unlock the potential of biomedical data and extract valuable insights. Yet amidst the hope sits a certain uncomfortable reality: not everyone is set to benefit from these advances. At the heart of innovation in healthcare lies the datasets used to train the algorithms, such as data from scientific literature, clinical trials, omics, and patient real world data. These datasets are the lifeblood of new technologies, and are extremely meaningful. Yet, they have significant shortcomings, since the majority of medical research is conducted on white and predominantly male populations of European descent. This lack of diversity in data has serious consequences for medical care, as the products discovered through use of this data may not benefit everyone.

As a data-driven artificial intelligence company, BenevolentAI recognises the value of having access to diverse datasets and has launched the Data Diversity Initiative (DDI) to address the challenge, raise awareness and work with stakeholders in the wider scientific and research community on possible solutions. At our inaugural DDI event in November 2019, our panel of experts discussed why we must drastically improve the way we design, collect and process biomedical research to ensure that health data fulfills its true potential.

Firstly, the panel discussed the need to ask the right questions when designing research studies, in particular: does the study represent the patient population accurately? The rich diversity of the UK’s NHS clinical population represents huge untapped potential for increasing diversity in data, however, Jackie Hunter of BenevolentAI (CE, Clinical and Strategic Partnerships) argues that this is often not fully embraced, especially in clinical trials.

When it comes to collecting diverse data, Dawn Duhaney, Product Manager at the Wellcome Trust, highlighted the need to build trust and increase transparency in the research recruitment process. People from diverse backgrounds may not trust medical or research institutions enough to engage with them, so work needs to be done around engaging with underrepresented groups and communicating how companies or institutions plan to use their data, and for what purpose.

This leads to another critical challenge: health data’s poor interoperability. Clinical medical records on diverse populations exist to a certain extent, however this data is held in siloed repositories such as electronic medical records, laboratory and imaging systems and physician notes. In fact, the World Health Organisation estimates that less than 20% of medical data is available in a form that ML and AI can ingest and learn from. The highly fragmented nature of medical data makes it very difficult and time consuming to access, share or combine. Improving the way we process data by improving data interoperability would more easily allow for health technology innovators to merge clinical medical records, thereby enabling access to a more diverse data pool.

Some countries have seen more coordinated efforts to enable data accessibility. As Diane Harbison, CEO of Decipher Analytics explains, in Scotland electronic health records have been used for longer, enabling the access to more longitudinal patient data since the Scotland healthcare board uses similar systems to track patients across the country. These coordinated efforts make it easier to access high quality datasets and maximise the chance to produce research that will have a positive impact on patient health. While Scotland does not have the most diverse patient population, this is a model which could be imitated elsewhere in the world.

We must be cognisant of the uncomfortable reality surrounding the application of AI in healthcare: if we feed new technologies imbalanced datasets, we will create an imbalance in those who benefit from tech-powered innovation. To ensure transferability and equality of medical treatment, we must improve the way we design, collect and process biomedical research. 

Back to blog post and videos