Dr. Claudio Fantinuoli

Blog about technology, speech, interpreting and the like

Data Privacy in AI Translation and Interpreting

  • AI development and use must respect the rights to privacy and data protection
  • Any AI application may present risk of data insecurity or leakage
  • Legal and/or hardware mechanisms can be put in place to grant a safe use of AI applications

Data privacy is a critical concern when using services, whether they are provided by humans or machines. There are many valid reasons for this: you may have confidential information that you do not want others to access, such as business strategies, financial data, or personal health details. Or you simply do not want others to profit, directly and indirectly, from your data.

In the fields of translation and interpretation, this concern is especially relevant. Consider a document or a meeting discussing sensitive business information, a company’s financial strategy, or a person’s health status—these are details that must remain confidential. In human interpretation, for example, professionals are bound by ethical codes and non-disclosure agreements (NDAs), and there is a general assumption that these secrets will be safeguarded. Think of the U.S. congressional committees that in 2019 were considering to subpoena President Donald Trump’s interpreter to testify about what was discussed in his meetings with Russian President Vladimir Putin. Here the interview of my former colleague at CNN making a point why this was not okay. Ultimately, in the name of professionalism and adherence to the code of ethics, this request never materialized. While professionalism may not offer an absolute guarantee, it serves as a strong safeguard for data security.

But what about AI translation and interpreting? There are growing concerns about data privacy when it comes to the use of AI services. I would argue that such concerns are well placed since we all know that data is the new gold. In the emerging Clash of Interpretations, the debate over who should be entitled to provide real-time translation services, data security has become a central argument in promoting the narrative that AI is inherently problematic. The screenshot on the left provides a good example of this narrative.

Since the main purpose of this blog, and my entire effort in discussing these topics, is to provide a realistic perspective on reality, present and future, I believe it’s essential to clarify some technical aspects to better understand the issues at hand. My perspective is derived by my hands on experience in developing and running such systems at scale.

First of all the question: are concerns about data (in)security justified? Absolutely. This is especially relevant for AI services operating on the cloud, which is currently the most common delivery method. In cloud-based services, data processing occurs on remote servers, often located miles away from the user’s device. The user’s computer essentially acts as a terminal for entering data and receiving results. For AI-powered interpretation, for example, speech is captured by a device, sent to a remote server for processing, and the translated audio is then returned—all within a fraction of a second.

So, the risk of data leakage somewhere in the cloud is real (read here about history biggest data leaks). As it is real that the service provider might retain your data. It’s crucial to understand that this risk isn’t inherent to AI itself. When a model is used, for example to translate a text, i.e., data is sent to obtain an answer, data isn’t retained by default because of AI. Data is only recorded if the service provider specifically opts to retain it. In other words, AI does not require to retain your data. It is a choice to do so. There are numerous legitimate reasons for data retention—and, unfortunately, some illegitimate ones as well. Data retention can be short-term, for instance, to provide an improved, personalized service. Alternatively, it can be long-term, such as when data is used to enhance AI models, benefiting the provider as well—or even primarily.

Does this mean AI services are inherently insecure? Not necessarily. There are two primary ways to ensure data security. The first is by using a service that adheres to high security standards and data protection (read here for the impact on EU’s GDPR on data protection in AI services). There are numerous certifications and legal agreements in place to ensure that your data remains protected, much like the security measures that protect your money in an online bank account, you email, or your photographs. These legal and technical frameworks are designed to keep your information safe. This guarantees that your data will be not stored or used for anything as for what you have agreed with. As said before, there are very valid reasons why you would want your data to be used, for example to improve the service you are employing (example: using your data to improve the translation quality in your use case) or to obtain additional services (example: a downloadable script of your meeting). The possible scenarios are many. Each should be explicitly agreed on. As in any professional agreement, trust and transparency is key.

Let’s be clear. Even the best intentions of the provider and the most stringent certifications cannot guarantee absolute security. Just as hackers can steal digital money, they can also steal digital data, including sensitive speech information. But again, these scenarios are not inherent to AI. They are possible, even if rare, for anything that is happening on the cloud. When things are very critical there is another layer of security that can almost cancel any risk of data leak.

This second layer of security is possible when AI services are run locally on a device. In this scenario, no data leaves the machine. This means there is practically no risk of data leakage even in case of bad actors. So, why isn’t local AI more widespread? The simple answer is complexity and computational power. To run cutting-edge technologies, such as machine translation and interpreting services require significant computing power that most personal devices cannot provide. And the technology behind the scenes is still very complex that it is practically impossible – today – to deploy it on the edge. However, this doesn’t mean local AI solutions are impossible, at least in principle. Many AI models, like speech recognition software and even large language models, can now be deployed locally. For example, the open-source model Whisper can be run on a standard computer, and more powerful models can operate on local private infrastructures.

This is also true for speech translation. While the complexity of running these systems locally remains significant, the trend is clear: more AI models are becoming deployable on individual devices. Over the next few years, we can expect to see an increase in offline devices capable of running AI interpreting systems. Imagine a simple box that performs real-time translation, securely processing everything on the device itself. In such a scenario, data privacy concerns would be eliminated by means of hardware and not only by means of legal contracts.

As technology advances, we are moving toward a future where AI services can offer both efficiency and full data security, making the choice between cloud-based and local solutions an important one.

To go deeper into the topic, I would like to suggest this publication by the Council of Europe: Artificial intelligence and data protection (Council of Europe)

One response

  1. Hassan Mizori Avatar
    Hassan Mizori

    Thanks, Claudio.

Leave a Reply

Your email address will not be published. Required fields are marked *