The Personal Data Protection Bill, 2019 (“PDP Bill, 2019”) was introduced by the Ministry of Electronics and Information Technology on December 11, 2019, in the Lok Sabha. The PDP Bill, 2019 is currently being reviewed and examined by a joint parliamentary committee before getting tabled before the Lok Sabha. The PDP Bill, 2019 heavily draws from the principles established in the European Union’s General Data Protection Regulation, 2016 (“GDPR”) and the landmark “privacy judgement” by the Supreme Court: Justice K.S. Puttaswamy (Retd.) & Anr. v. Union of India, wherein, right to privacy was upheld as a fundamental right under the Indian Constitution.
The phrase “inference drawn from personal data” or “inferred data” is conceptualised within the definition of “personal data” in the PDP Bill, 2019. Under Section 3(28) of the PDP Bill, 2019, personal data is defined as:
“data about or relating to a natural person who is directly or indirectly identifiable, having regard to any characteristic, trait, attribute or any other feature of the identity of such natural person, whether online or offline, or any combination of such features with any other information, and shall include any inference drawn from such data for the purpose of profiling”
The inclusion of inferred data as personal data has sparked some debate on whether it should be included or not. The PDP Bill, 2019 proposes a rather nebulous reference to inferred data as personal data. Furthermore, this inclusion of inferred data within personal data is in contrast to the 2018 draft of the PDP Bill, 2019 which did not include any such reference. As a result of this addition, the scope of personal data covered by the law will be expanded. Thereby, it lacks the necessary jurisprudence to comprehend the scope of protection afforded to such data. Thus, it’s critical to navigate the meaning of the term within the PDP Bill, 2019 and comprehend its possible future applicability and ramifications.
“Inferred data” or “inference drawn from personal data” is data generated by a system, not explicitly provided by the user. It generally refers to the data and characterisations that are assigned to a person based on their behaviours and activities, often concerning the consumption of online content. Thus, if the available personal data allows an inference regarding that data principal’s gender, age, income or sexual orientation or any such personal attribute, such inference shall also form a part of the personal data.
Scope: Application and implications
The inclusion of inferred data will further allow the data principals the right to request such data from data fiduciaries. However, it is unclear whether the reference to the inferred data in the PDP, 2019 refers to all kinds of insights derived from the data being processed. Therefore, there could be serious ramifications in terms of an enterprise’ economic interests.
This is considering that de-identified data (data from which all personal identifiable information has been removed) is already regarded as personal data unless it has been anonymised irreversibly. The expanded meaning and scope of personal data will ensure its protection under the PDP Bill, 2019 as well.
On a computer that has been taken over by bots, such data can be simply generated by AI and cognitive services. If such kind of data is considered personal data, unless it has been anonymised irreversibly, the implementation of such an action “might become nightmarish.” This is plagued by the fact that achieving irreversible anonymisation has been observed to be unlikely all around the world.
The separation of inferred data from data sets could be meted out with serious difficulties as the inferred data becomes a part of the retrained model, and it cannot be erased, forgotten, or unlearned. It also becomes a permanent part of the model repository. There are certain exceptions that could be made. For example, when direct inferences are made out of personal data such as, “what movies do you like?”
The right to confirm, rectify and access personal data will be extended to personal data generated from an individual’s observed behaviour and habits, ie inferred data. Such information is clearly personal, and a data principal should be informed that the data controller has this type of data. However, there will be challenges in implementing it because inferred data is not always data in the sense that we understand it, as observed above.
In furtherance, inferred data sets are an essential stage of Big Data analytics. Big data analytics is the process of analysing massive amounts of data to find hidden patterns, correlations, and other insights. In terms of Big Data Analytics, inferred data refers to information derived from a complex system of analytics that identifies correlations between various datasets and uses this information to profile people. For example, predicting future health outcomes, calculating credit scores or targeting products to individuals based on their preferences which have been sought from a large volume of diverse data sets. Such data relies on probabilities and therefore can be said to be less ‘certain’ than any kind of derived data. Following the PDP Bill’s ambiguous conception of the term, this type of inferred data cannot be claimed to be strictly limited to personal data.
Further, the right to data portability will also empower the data principals with the right to port the data which is inferred by the data fiduciary to form a part of the profile of the data principal. The right to data portability allows individuals to receive and reuse their personal data across different services for their own purposes. The inferred data by any data fiduciary is to probably form a constructive part of the business model of any entity. The inclusion, in regards to data portability will thus come with its own set of problems.
The data privacy regime in India is still at its nascent stage. Other comparable jurisdictions, on the other hand, follow data privacy conventions and legislation that may provide some guidance on how inferred data should be conceptualised.
The General Data Protection Regulation 2016
The inclusion of inferred data under the PDP Bill, 2019 is observed to be a deviation from the general perception of personal data as understood under GDPR. Article 4(1) of the GDPR provides that ‘personal data’ means any information that pertains to an identified or identifiable natural person (‘data subject’). Inferred data has been completely removed from the purview of GDPR’s comprehension of personal data. Under the material scope of GDPR, the ‘inference data’ drawn may constitute as personal data only when it fulfils four of the essential elements belonging to the definition of personal data, which are:
- Inferences drawn as ‘any information’
- To establish a link between the interference, the information, and a natural person.
- Inferences and the element of ‘identified / identifiable.’
- Inferences and the element of ‘natural person.’
The Article 29 Working Party is an independent advisory committee comprising of a representative from the data protection authority of each EU Member State. It dealt with issues relating to the protection of privacy and personal data until 2018. It distinguished inferred data from other types of data as data that is generated by the data controller utilising analytics on data provided by the data subject, such as credit rating or health risk.
The Article 29 Working Party considers both verifiable and unverifiable inferences as personal data, eg medical analysis results. However, it failed to address whether the processes and reasoning that led to the formation of the inference are also similarly classified. There is no streamlined view given by the European Court of Justice as recent cases have proved to be inconsistent. Even if inferences are considered personal data, there is a significant curtailment concerning the right of the data subjects’ to know about, delete, rectify, port or object them. Even concerning sensitive inferences, the GDPR provides insufficient protection or remedies that challenge them.
The European Union’s report on the GDPR’s impact raises concerns about anonymised and inferred data, and how their entire exclusion from the GDPR’s scope has resulted in enforcement challenges. This issue, however, is less serious than the one raised by the PDP Bill, 2019 because there is no option for legitimate interest processing, as there is under GDPR. Considering that data such as inferred data is a reasonably natural outcome of processing under the GDPR, there is a contractual need clause and a legitimate interest processing clause for businesses. It enables them to not obtain the consent of its users individually.
The Article 29 Working Party has also referred to ‘inferred data’ while enumerating upon its exclusion from the definition of personal data in terms of the right to data portability:
“the terms “provided by” includes personal data that relate to the data subject activity or result from the observation of an individual’s behaviour but not a subsequent analysis of that behaviour. By contrast, any personal data which have been generated by the data controller as part of the data processing, e.g. by a personalisation or recommendation process, by user categorisation or profiling are data which are derived or inferred from the personal data provided by the data subject, and are not covered by the right to data portability…”.
The California Consumer Privacy Act 2020
The California Consumer Privacy Act, 2020 provides that any “inferences drawn” from personal information data which is to create a profile about a consumer reflecting the consumer’s characteristics, psychological trends, preferences, predispositions, behaviour, abilities, intelligence, attitudes and aptitudes, will constitute as personal information. Furthermore, the legislation also provides a definition for “infer” or “inference” which means:
“the derivation of information, data, assumptions, or conclusions from facts, evidence, or another source of information or data”.
The Australian Privacy Principles
The Australian Privacy Principles serve as the foundation for the privacy protection framework in Australia, ie The Privacy Act 1988. The guide provided for the APPs specifies that the inferences drawn from individuals from other data, whether accurate or not, form part of the personal information. It further gives an example concerning the inferred data:
“Example: An organisation infers information about an individual from their online activities, such as their tastes and preferences from online purchases they have made from their web browsing and/or transaction history. Even if the inferred information is incorrect, it is still personal information”.
The conception of ‘inferred data’ within the realm of personal data provides much food for thought going forward. As observed, the landscape of inferred data in India is one of uncertainty. This necessitates a prompt and comprehensive response, especially as this component of data governance – the regulation of inferred data – appears to encompass both personal and non-personal data.
It is vital to define what comprises personal data using examples such as identifiers, as is done in foreign legislation (The California Consumer Privacy Act and the Australian Privacy Principles).
It has been proposed by independent entities that the PDP Bill, 2019 replace the reference to “inferences drawn from personal data” with “de-identified data used for the purpose of profiling” to avoid any unintended effects of subjective interpretation. Among other things, this would provide clarification on the fact that inferences and insights based on anonymised personal data will be excluded from the Bill’s scope.
Future versions of the framework, as well as the rules that will follow, would hopefully address these issues.
(This post has been authored by Devika Bansal, a fourth year B.A. LL.B (Hons.) student at Gujarat National Law University, Gandhinagar.)