AI and GDPR: Data Protection and ChatGPT Compliance

While contemporary AIs are a far cry from what science fiction (Skynet, HAL 9000 and the Matrix) would have us believe in terms of capacity and ‘consciousness’, they still raise their share of questions. Particularly when it comes to personal rights and freedoms.
These AIs are algorithmic assemblies capable of performing tasks that, if carried out by a human, would require time, intelligence or creativity. The last few months have seen a flood of new AIs with a variety of uses. Obviously, the first to come to mind are conversational AIs (capable of holding a conversation as a human would) such as ChatGPT, Bard and Grok. We can also refer to generative AIs such as Midjourney, which can generate images, voices or even videos from a text description.

The rise of conversational AI such as ChatGPT

The loudest of all these AIs is, of course, ChatGPT. From ChatGPT 1 in June 2018 developed by Open AI to ChatGPT 4, the current version, the technology has improved a lot and the possibilities always seem to get bigger and bigger.

What is conversational AI?

These conversational AIs are all based on one principle: the Large Language Model (LLM). This is an artificial intelligence model designed to understand and generate text in natural language, in other words, like a human. To work, they need to be powered by massive artificial neural networks linked together by “parameters” and fed with immense volumes of textual data.

By way of illustration, Chat-GPT 1 had a neural network based on 117 million parameters. ChatGPT 4 is based on around 100 trillion parameters. Note that Google has also embarked on the conversational AI adventure with its own “Bard” model, which has “only” 137 billion parameters. This technological lag is due to the fact that R&D began well after Chat-GPT was launched.

Created by Open AI and acquired by Microsoft, ChatGPT is a success, with almost 100 million active users every day. GPT4 is not free, but version 3.5 (175 billion parameters) is available free of charge. This version is less sophisticated than the paid version. The multiplication of parameters means that the volumes of data used must be colossal. This data is drawn from the internet, acquired databases and user input on the platform itself. This data, which is likely to contain a large amount of personal data, raises major ethical issues for this technology. We can find free text fields filled in by Internet users, CVs, nominative research work: in short, anything that can be found on the Internet. Finally, if inaccurate or biased data is used to train AIs, the responses of these AIs will also be biased.

The main uses of conversational AI

The vast field of application of conversational AI

The strength of AI lies above all in the wide variety of uses it can be put to. They can be used to automate a wide range of processes: customer service, reservations, technical support, chatbots, education and even automatic text drafting. One example is Onclusive, which has laid off 217 of its 383 French employees responsible for media monitoring and summaries, replacing them with an AI that would, on the face of it, be more efficient. Another widespread use for ChatGPT is the generation of computer code from text instructions.

Interoperable tools

It is also possible to connect these conversational AIs with other generative AIs. A recent example of this is a Twitch channel that was launched in August, on which Emmanuel Macron could be seen answering questions from Internet users 24 hours a day. This was obviously not the real President of the Republic, but a combination of conversational AI that analysed chat questions and formulated text answers, and two generative AIs responsible for creating Emmanuel Macron’s animated image and voice respectively.
This last example clearly demonstrates the risks of these technologies in terms of disinformation. What’s more, these technologies are evolving in the midst of a degree of legal uncertainty, and the threat of cyber-maliciousness has never been greater than it is today.

From cyber risk to the legal insecurity of conversational AI

Data security

Beyond the ethical and legal issues, these technologies face a real cyber risk. Among the cyber threats facing individuals are practices such as identity theft and phishing (receiving a misleading e-mail containing a malicious link). Whatever the cyber-attacker’s objective (identity theft or phishing), they need a large amount of data to refine their attack. The lack of security in these AI tools means that there is a risk of data collection by malicious parties. The more personal information available, the more likely it is that people receiving phishing e-mails will click on the links in the e-mails. Similarly, the more access the cyber-attacker has to this type of information, the more likely the identity theft is to work.
In the light of this analysis, it is clear that AIs, which by their very existence require monumental databases to be built up, are becoming a prime target for cybercriminals. A veritable goldmine.
Data security in the use of AIs is a particular concern for the supervisory authorities.

AI GDPR compliance and best practice

Since 2018, the GDPR (General Data Protection Regulation) and its legislative little brothers (such as the Digital Act Services or DSA, which strengthens the obligations of major platforms in terms of combating misinformation – if you are a DPO, and want to know what to remember about the DSA) have greatly strengthened the mechanisms aimed at protecting individuals’ digital privacy. However, the GDPR does not specifically cover AI issues.

Right to information

The GDPR lays down the main principles that entities must respect in order to process individuals’ personal data. These include the right to information. Any data controller wishing to collect and use people’s personal data must provide clear and intelligible information about the processing operations carried out on the data. If you use conversational AI in your activities or on your website, you must indicate this in your privacy policy, which is accessible on your website. For example, if you have an e-commerce website and use a chatbot, Admeet can help you to ensure that your e-commerce website complies with the GDPR and to formalise your policy using a privacy/privacy policy generator.

Legal basis

The volume of data needed to learn conversational AI requires an inordinate amount of personal data to be collected, and therefore regularly requires the re-use of personal data that is initially collected for another purpose. This new processing must therefore have a legal basis.
The consent of the data subject is a priori required. This means asking people, at the time the data is collected, or afterwards, to be able to re-use it as part of the AI learning process. Some companies may consider that this processing has a legal basis in their legitimate interests, but it is difficult to anticipate the analysis that the supervisory authorities will make on this point.

Data minimisation

Another principle laid down by the GDPR is that of data minimisation. According to this, a data controller (the company) must only collect the data it strictly needs to achieve the purpose of processing personal data. Here again, the very principle of AI learning makes all existing data potentially useful, making it difficult to comply with this minimisation principle.

Data retention

The GDPR also requires those responsible for processing personal data to set a retention period for personal data. Generally speaking, the entity may only retain data for as long as it can justify that it still has a use for it in the pursuit of the purpose for which it is retained. This point raises questions about the use of data to advance AI models. ChatGPT user data is deleted “when we (ChatGPT) no longer need it”, according to ChatGPT’s privacy policy. This very vague and general wording is questionable. Furthermore, ChatGPT does not disclose what happens to the data used to train its algorithms. There are therefore major questions about the compliance of these elements.

The point of view of the CNIL

It was in this context that, in March 2023, the Garante per la protezione dei dati personali (the Italian supervisory authority) had temporarily banned ChatGPT for non-compliance with the GDPR (before authorising ChatGPT again in Italy) based on 4 points:

No verification of the age of users (the GDPR prohibits the collection of data from minors under the age of 15 without parental consent) ;
No notification to the authorities or data subjects of a personal data leak;
ChatGPT users were not informed that the information they entered into the platform could be reused to train models;
No legal basis to justify the massive use of data to train models.

Twenty-nine days later, this ban was lifted. The Italian authorities considered that guarantees had been provided on the points in question. However, no information on the custom measures taken by Open AI to remedy the situation has been made public.
Following complaints from users, the CNIL (the French data protection authority), followed by the Irish (DPC) and German (BfDI) supervisory authorities, have also approached Open AI and the Italian authority to adopt a common approach to analysing the compliance of this tool. The European Data Protection Board (EDPB) then set up a dedicated team to analyse ChatGPT’s compliance.

The AI Regulation: a new text being examined by the EU Parliament

The European Parliament adopted a first version of the IA Regulation in June 2023. However, it is currently under discussion between the European Council, the European Commission and the Parliament.
The aim of the text is to guarantee the development of safe, traceable, transparent and ethical technologies. The regulator is proposing a risk-based approach, defining different rules and obligations depending on the risk posed by the technology.
For example, AI tools offering a social score (evaluating people and assigning a score according to a “good citizen” scale) will be banned from EU territory, as they present an unacceptable risk.
Conversely, technologies defined as posing a limited risk will simply have to comply with transparency requirements that will enable users to make informed decisions.

Good practice in reducing the risks of using AI

In France, in October 2023, the CNIL unveiled its first responses for innovative AI that respects privacy by sharing its analysis and practical sheets for using AI in compliance with the GDPR. Here are a few guidelines to give you the right reflexes for using this type of artificial intelligence:

Determining the applicable legal regime;
Defining a purpose;
Determining the legal status of AI providers;
Ensuring that the processing is lawful with a specific legal basis and, where this is consent, properly tracing the obtaining of this consent and giving the persons concerned the choice of withdrawing it at any time. Admeet’s consent management tool can help you to retain consent for cookies;
Take data protection into account in the design of the system and in the collection and management of data;
Inform individuals in a clear, lawful and specific manner via a confidentiality policy. Solutions such as Admeet can help you generate a privacy policy.

In conclusion, it is possible to use AIs like ChatGPT in a way that complies with current regulations. These should become clearer once the European Union’s AI Regulation is adopted. There are societal issues at stake, between technological innovations and the protection of individuals’ rights and freedoms.

Start with Admeet

Using ChatGPT AI to comply with the GDPR: a practical guide