OSINT: what compliance with the GDPR ? (3/4)

  • Warning: this article is the sole responsibility of its author.

While OSINT’s activity doesn’t have a legal framework today, it’s still required to fit into the French regulatory environment, as well as to respect the European regulations in force – in this case – the General Data Protection Regulation (GDPR).

The CNIL (Commission Nationale Informatique et Libertés – National IT and Liberties Commission) has published recommendations regarding the practice of data leak research, which is a branch of OSINT.

Establishing responsibilities

The CNIL recommends that the roles and responsibilities of the actors should be divided, as each data breach research operation must be framed by a contract between the client company (which will be the data controller) and the service provider company (which will be the subcontractor). This contract allows, among other things, to specify the obligations of each party and to include the requirements of Article 28 of the GDPR.

Define the purposes

The contract will also outline the specific purposes of the data processing, as well as what implementing monitoring activities and remediating data leaks would entail.

It’s also important to make sure that data leak research is authorized, as its use must be based on one of the legal bases provided by the GDPR, such as the legal basis of the legitimate interest of the data controller (if the latter has leaked data and is researching it).

There are six main conditions that must be met to justify the use of data leak research:

Justifying a legitimate interest to do a data leak research

The security of the network and information systems constitutes a legitimate interest (Recital 49 of the GDPR) that the data controller could put forward. Indeed, the purpose of data leak research is to ensure the protection of an organization’s information by identifying possible data leaks, revealing security flaws inside networks or information systems that may impact the protection of personal data.

Demonstrating that data leak research is necessary to achieve the intended purpose

By the intrusive nature of data leak research operations (massive data collection and analysis), the organization doing data leak research should be able to demonstrate there is no other efficient manner to detect certain data leaks.

This may be particularly the case when the data leak is the result of actions by someone malicious working for the organization who have legitimate access to the data to perform their work duties. In such cases, the data leak may be difficult to detect in spite of the existence of measures already in place within the organization.

Balancing the interests of the organization using data leak research with the rights of the individuals whose personal data is processed

When the legal basis of processing is the “legitimate interest of the controller,” the controller must be able to demonstrate that data subjects have a reasonable expectation that their personal data will be used for the controller’s purpose. In the context of data leak research, this means that individuals must be able to expect that their data will be collected and analyzed to ensure the security and protection of the organization’s information assets, particularly in light of the strategic importance of their functions, or the projects they are working on.

Thereby, the managers of an organization exposed to the risk of illegitimate access can reasonably expect that their name will be monitored, the question arises for an employee who does not have a position of responsibility or a link to the security of information systems or is not involved in any sensitive project.

However, the reasonable expectations of the persons concerned are not the only element to be involved. Indeed, the objective of security pursued by the controller must be sufficiently important to not create an imbalance to the detriment of the rights and interests of the data subjects. It is therefore important that the organization considers elements such as the nature of is activity and the data that must be protected, but also the objective of protecting the privacy of individuals whose data may have been made publicly accessible. The more sensitive and numerous the data to be protected, the more data leak research operations may be deemed proportionate on a case-by-case basis.

Defining a limited data retention period

As a reminder, data leak research does not systematically involve the collection of personal data. Personal data is only collected when it is relevant to the purpose of the processing and when it corresponds to the keywords that were defined before the search. In consequence, non-relevant data is never kept after the matching analysis phase.

Concerning the collected data, as for any file, a limited retention period must be defined. This must be determined according to the purpose of the research. For example, if the data leak research concerns a particular aspect of a strategic project (for example, a stage in the submission of a response to an important call for tenders), the duration should consider the specificities of the project.

With respect to the results of the research, when data leak research makes it possible to find the data that was initially leaked, it can be kept for the time necessary for legal proceedings and, if necessary, for the analysis of the origin of the violation.

If, despite all the precautions taken, data leak research leads to the collection of data that is not sought by the organization, the data must be deleted immediately after collection (see below, the safeguards to be implemented).

Finally, keywords used for research purposes may be retained for the duration of contract pertaining to the data leak research.

Using all means to avoid collecting irrelevant data

The organization needing to use data leak research must ensure that all means are implemented to avoid collecting data that does not originates from its information systems.

In particular, it must implement all measures to limit the collection of particular categories of personal data (e.g. health data, infraction data or data related to sexual life…), especially within the keywords used for data leak research or in selecting the sites targeted by data leak research.

If, despite the measures taken, sensitive data that was not initially searched for happens to be processed, the organization will, again, have to delete it immediately. On this point, it’s possible to have the same reasoning as the processing of personal data carried out through search engines to which the principle of prohibition of processing of sensitive data only applies retrospectively when the search engine is informed of the sensitive nature of the data it holds (on this point, see the technical and organizational measures developed below).

Data leak research providers must ensure that the techniques used in the framework of their missions do not infringe on automated data processing systems (which is punishable by law), for example:

– no vulnerabilities must be exploited in order to search for information;

– no security measures should be mitigated or circumvented voluntarily (no passwords should be broken, default passwords should not be used to enter a system, etc.);

– Only information that is accessible without security circumvention should be collected.

Respecting the rights of individuals

Each company that decides to use data leak research must ensure that the rights of the persons are respected.

If the information of people associated with the organization (customers, employees, managers, etc.) can be provided, for example, through the privacy policy, the IT charter or the employment contract, in order to provide individual information, data leak research is likely to involve the processing (or at least the consultation) of data relating to people with whom the organization has no connection.

This is because data leak research involves the analysis of a large amount of content posted online in order to identify content that matches keywords determined by the company who use the data leak research. As a result, providing information about individual data subjects can be very complex.

In the case of data gathering that does not take place directly from individuals, the GDPR may exempt the organization from providing individual information if providing such information would be impossible or would require extraordinary efforts. However, this exception must be strictly interpreted through a case-by-case analysis and cannot be a general rule. In this case, the organization will have to make the information public, for example, by making it available on its website, or by providing information to individuals in case they are subsequently identified and reachable.

If the data leak research reveals the existence of a data breach within the company’s information system and that it is likely to result in a high risk to the rights and freedoms of an individual, the data controller will have to communicate to each data subject the information concerning the breach so that he or she can take appropriate measures.

Finally, individuals have the right to access, erase and rectify their personal data, as well as the right to limit the processing carried out and to object to it, which must be implemented by the data controller.

In order to limit the consequences of the practice of data leak research on the rights of data subjects, the CNIL recommends the following measures, namely:

  • That the keywords previously defined be directly linked to the objectives pursued;

  • That the keywords in question do not include personal data: CNIL encourages the use of digital markers (“canary tokens”) inserted in the databases beforehand, so that the data is synthetic and does not correspond to that of a real individual;

  • That the structure of the data leak research practice does not target sensitive data such as data concerning a person’s health, sex life or political or sexual orientation;

  • The search should be automated and raise alerts based on keywords;

  • The websites targeted by data leak research should not contain sensitive data by nature (dating sites, sites of political or religious expression, etc.);

  • The search may require the creation of a user account on a legal website. As a reminder, it is prohibited to assume a person’s identifiers/identity in order to access a system;

  • People can get involved for the sole purpose of validating and analyzing the final results of the research. They must be authorized to consult the data and be subject to reinforced obligations in terms of confidentiality and made aware of the issues of personal data protection, for example, through a mandatory training program.