In response to the increasing involvement of UK businesses in advanced data analytics, the phenomenon dubbed 'big data', the Information Commissioner (ICO) published a report on 28 July 2014 entitled 'Big Data and Data Protection', with the aim of helping companies involved in big data initiatives to comply with the Data Protection Act 1998 (DPA).
The report acknowledges the difficulty in producing a robust definition of 'big data', given that the term describes the features of the data being analysed and the method of processing, rather than a specific technology. However, key characteristics of big data are identified, including the '3Vs' of volume (large data sets), variety (combining data from multiple sources, including external and unstructured data, such as Twitter feeds) and velocity (collecting and analysing data in real time as it is generated). Other relevant characteristics include analysing all data held, rather than limited samples, and 'repurposing' data, meaning that it is used for purposes other than that for which it was originally obtained.
Under the DPA, 'personal data' is data relating to an identifiable living individual, whether the individual can be identified from that data alone or only by combining the data with other information. Some data will not constitute personal data (eg GPS-equipped buses), other data will (eg mobile phone location data and purchases made on loyalty cards).
If data is fully anonymised, it is not regulated by the DPA, but organisations need to ensure that they have reduced the possibility of potential re-identification of individuals such that the risk is 'extremely remote'. Even if this standard cannot be met, anonymisation techniques can be a useful way of demonstrating overall compliance. The report cross-refers to the ICO’s Anonymisation code of practice for more detail on implementing an anonymisation process.
The first principle of the DPA requires that the processing of personal data be 'fair and lawful'. This can be particularly important where big data analytics involves repurposing of data in unexpected ways. Businesses are advised to assess whether the analytics is within the reasonable expectations of individual data subjects, which includes the level of transparency at the point of data collection and the extent to which the purpose of the processing is intrinsic to the service for which the data was provided or unconnected with it. Also relevant is the use made of the big data outputs eg detection of general trends and correlations is more likely to be fair than analytics used to profile individuals or treat them in a particular way.
Under the first principle, processing of personal data must also be justified by reference to one of the conditions in Schedule 2 of the DPA (and, where the data is classed as sensitive, a condition in Schedule 3). The ICO considers the most relevant Schedule 2 conditions to be consent, necessary for performance of a contract, and the legitimate interests of the data controller or third party recipients.
Any consent must be freely given, specific and informed, with the right to withdraw consent in future. If an organisation is repurposing data originally collected on the basis of consent for other purposes, it will need to obtain consent to the analytics unless one of the other conditions can be satisfied, particularly if the analytics would fall outside individuals’ reasonable expectations. If datasets are licensed from third parties, the licensee will need to ensure that the scope of consents obtained covers the proposed big data analytics.
The ICO notes that it may be difficult to rely on 'necessary for the performance of a contract', because big data often involves repurposing the data in ways which go beyond what is necessary to deliver a product or service.
A big data project may well be within the 'legitimate interests' of the data controller (eg customer profiling in order to target marketing more effectively), but this condition requires a balancing exercise, weighing up those interests against any adverse impact on the legitimate interests of the individuals concerned. Also, the processing personal data must be 'necessary' for those legitimate interests which 'means that it must be more than just potentially interesting' to use personal data in the pursuit of those interests. If there is another way of meeting that legitimate interest that has less impact on privacy (such as anonymising the data), it may not be possible to rely upon this condition.
The second principle requires that personal data is only collected for specified and lawful purposes and is not processed in any manner that would be incompatible with those purposes. The report argues that this is not a barrier to big data as it does not prevent the repurposing of personal data. Rather, it prohibits the use of personal data for new purposes which are incompatible with the original purposes for which the data was collected, and a key question is whether use of the data for the new purpose is fair and within the reasonable expectation of the relevant individuals. If the new use of data would be unexpected, it would not be considered compatible with the original purposes for which the data was collected and a specific consent may be required.
The third principle of the DPA requires that personal data be 'adequate, relevant and not excessive in relation to purpose or purposes for which they are processed' and the fifth principle requires that the data 'not be kept for longer than is necessary for that purpose or those purposes'. The big data characteristics of volume and of analysing all the data in a dataset have the potential to contradict this concept of data minimisation, and whilst the 'variety' of data can be a critical success factor, the collection of data without thought to its relevance to the objective could lead to certain data being deemed to be irrelevant or excessive. Organisations should determine at the outset which datasets they need in order to achieve defined aims. Also, the obligation to limit storage of personal data may conflict with a commercial desire to retain it for future analytics, given the reducing cost of data storage and the ability of analytics software to process massive amounts of data.
After considering other data protection principles in the context of big data, such as data security and data processors, the report then discusses tools which organisations can use to support compliance. These include the use of Privacy Impact Assessments (PIAs) and the 'privacy by design' approach, where organisations 'bake in' privacy-enhancing technologies or other protections at the start of a big data project.
Transparency is a key concept and for data processing to be fair under the DPA, a privacy notice must be provided to individuals stating the identity of the organisation collecting the data, the purposes for which the data will be processed and any other information necessary to ensure that the processing is fair. Organisations can go some way to achieving fairness by ensuring that privacy notices explain the secondary uses of data for analytics purposes. The report states that, if personal data is collected for one purpose and is then repurposed for big data, the organisation needs to update its privacy notice and ensure that affected individuals are aware of this.
The report notes that the proposed EU General Data Protection Regulation contains a number of provisions which will impact on big data analytics. Relevant provisions include the more explicit provisions regarding data minimisation and the requirement to justify processing personal data as opposed to de-identified data, which may discourage speculative collection of identifiable data and encourage anonymisation. Privacy by design and PIAs are also placed on a statutory footing in the proposed Regulation and privacy notices will need to include information on the period for which data is retained.
Whilst some argue that the DPA is inappropriate for a big data world, the ICO’s view is that the flexibility of the data protection principles means that they remain fit for purpose and do not obstruct big data, noting that 'big data is not a game that is played by different rules'.
The Article 29 Working Party has indicated that it will publish its own paper on big data this year, and it will be useful to compare that opinion with the views of the ICO in this report. All guidance is to be welcomed, as the ever-reducing cost of compute and storage capacity and the increasing power and availability of big data software, including open-source systems like Hadoop, means that the big data phenomenon may have barely begun.
This article was published in E-Commerce Law & Policy in September 2014.
 An independent advisory body on data protection, comprising representatives from the national data protection authorities of the EU Member States.