AI and the use of data in the life sciences sector

Posted: 24/06/2024

AI is having a major impact on life sciences. Its ability to process vast quantities of data and information faster than humans has numerous valuable applications in the sector, from creating new drugs and targets, to use in medical devices to diagnose disease, as well as monitoring patients remotely, and analysing data, for example from MRI scans.

However, in utilising AI in the life sciences sector, it is crucial that developers and users consider issues such as data quality, bias, fairness, and data privacy, and actively work to minimise or mitigate any harmful consequences. It is also important to ensure that data protection and intellectual property rights are upheld at all stages of the AI lifecycle. This is especially important considering the proliferation of ‘black box’ AI in models used in the sector, which can limit transparency and explainability.

The UK government’s approach to AI regulation differs from the EU approach of detailed regulatory legislation. Instead, the UK’s approach to regulation is set out in the proposed Artificial Intelligence (Regulation) Bill in the UK, which creates a new AI authority to provide a framework and principles for AI regulation. Specific detailed regulation and guidance is delegated to the relevant regulatory authorities, such as the MHRA, the regulatory body for medicines and medical devices.

This article will consider the key issues associated with developing products that incorporate AI in life sciences, and how organisations can address these issues to facilitate the use and approval of their products. A future article will take a more detailed look at data sets and their licensing and use in AI systems.

Transparency and explainability

The transparency and explainability of AI models is a key issue that permeates the use of AI in the sector due to medical application of AI systems in life sciences, especially as part of medical devices. This issue is further amplified by what is called the ‘black box’ nature of some AI models, whereby the process of producing outcomes, including the weight attributed to certain factors in a system’s decision making, is not revealed by a developer of the AI or the user of the AI as a component of a product.

This can make it difficult to understand how a particular decision has been reached by an AI system, which of course creates issues with transparency and explainability in a sector that relies on such principles. Where a black box approach is taken by a developer, its ability to withhold such information may be restricted by regulations such as the UK’s AI Bill, as well as the MHRA proposed approach that introduces strict transparency requirements, including the full disclosure of AI training data sets.

Transparency and explainability can be provided by design, where developers of systems or the users themselves explain the decision-making process of their systems. Developers can also use licensing agreements to set out guidelines for use of data or systems, such as providing information relating to the types of data used, sources of data, and data ownership, for example.

This can be managed through legal agreements with AI providers, including agreed data audits, indemnities, or other contractual controls. In any event, whether developers or users of AI systems wish to restrict transparency through a black box model or otherwise, this may not be possible in life sciences where regulators such as the MHRA require the ability to review and scrutinise decisions made by an AI system to ensure accuracy, alongside regulation that sets out minimum standards for transparency.

Data quality and integrity

Data quality refers to the accuracy, consistency, relevance and completeness of data sets, and this is a crucial element in ensuring that the outcomes of AI in the life sciences sector are reliable. High quality data is extremely valuable in the sector as the outcomes of AI models have a direct impact on drug development and in patient diagnosis, treatment and care where AI medical devices are utilised.

Alongside addressing bias in data sets and implementing robust internal data governance policies, data can be cleansed, processed to create better structures for interpretation by AI models, or augmented with artificial data points produced using the existing data. Augmentation needs to be considered carefully in the life sciences sector as any data quality issues in underlying data can be amplified. Setting data quality standards and establishing processes, with the objective of producing high quality data through the implementation of such measures, ensures that AI models generally produce better outputs.

Bias and fairness

AI systems are often trained using real-world data sets and these can come with baked-in bias. This can result in outcomes produced by AI that are unfair or that perpetuate such bias in society. Biases can also be introduced by algorithms themselves, as well as by any human decision making in the lifecycle.

To mitigate the introduction of bias in the AI lifecycle, care must be taken in the collection of data and the compilation of data sets. This includes the ethical collection and anonymisation of personal data, as well as producing data sets that are appropriately diverse or representative of the population. Such approaches can be prescribed as part of license agreements between parties in the supply of AI systems and associated training data, but effective policies must also be implemented if users capture and supply their own data for training such systems.

Ethics and privacy

Patient privacy and the use of personal data in AI systems is a widely recognised issue related to the deployment of AI in life sciences. In order to train AI models, considerable amounts of data must be supplied to the system for training purposes. In life sciences applications where AI models are used for improving the accuracy and efficiency of diagnoses, or for speeding up drug candidate identification, for example, significant amounts of patient generated data is required for AI to produce reliable outputs.

The use of this patient data, if it remains linked to a patient, raises questions about patients’ data rights, and how such rights can be upheld. As health data is considered special category data for the purpose of the UK GDPR, processing such data is subject to enhanced protections, such as the requirement to rely on an Article 6 lawful basis, as well as an Article 9 special category condition for processing, such as explicit consent from data subjects or justification based on substantial public interest.

Care should be taken by developers and users of AI systems to ensure that such justifications are established and recorded, and that there are appropriate processes to ensure such data is anonymised before any processing by an AI system where identification of a patient is not required.

Summary

There are many risks associated with the use of AI, and these risks are often amplified in the life sciences sector, as the data used by AI systems can be more sensitive, and the outcomes of the systems higher risk.

For these reasons, such issues should be identified and addressed wherever possible, whether this is through the establishment of internal processes relating to data collection or use of AI, or the creation of robust internal AI and data governance policies. Agreements between parties governing the licensing of training or reference data, or the supply of AI systems, are another useful tool in mitigating such issues, as standards can be agreed to govern the parties’ obligations for data quality and transparency.

Organisations should review all of these aspects as part of their approach to developing or deploying AI systems in their research or operations, or as part of new products, and implement measures to address the specific risks for their AI deployment. Without such measures the desired outcomes of using AI models may not be attainable and their use may not be approved by regulatory authorities.

The potential of AI to benefit the life sciences sector is huge, but it remains a new technology with much to prove to reach that potential; however, with guidance from regulatory authorities, its implementation will hopefully increase.

Latest news

The UK’s AI bill requires regulators to set up regulatory ‘sandboxes’ for developers to test AI products in a safe environment. The MHRA launched its ‘AI Airlock’ regulatory sandbox on 9 May 2024, and the call for first projects to work within this will commence after an introductory webinar on 23 July 2024 to explain the requirements for projects to participate in the AI Airlock.

The principles and important considerations for AI models and products described in this article are likely to be important aspects for testing as part of the AI Airlock.

Return to news headlines