Many people hold a static view of what an AI interpreter is or will be: a tool that translates literally and blindly, no matter how unclear or garbled the original speech is—whether it’s mispronounced, unintelligible, or ambiguous. A mechanical device capable only of direct, word-for-word translations. In other words, a piece of software that will translate everything, even if something isn’t correctly understood, often resulting in awkward, nonsensical, or even inappropriate translations.
This perception isn’t entirely wrong, considering the limitations of current technology as it’s available to users today. However, this view is quite short-sighted. AI interpreters are evolving rapidly, and it is possibly to argue that they are on the verge of gaining a form of agency. In some non distant future, we will enter the era of what I call, for lack of a better term, Agentic AI Interpreters—AI systems capable of understanding context, making decisions, and therefore producing, at least in theory, more accurate and nuanced translation of spoken content.
In this context, by agency I specifically mean the machine’s ability to make decisions about its actions, granting it a kind of active participation in the conversation it is translating. This concept aligns well with Floridi’s framework of “AI as Agency Without Intelligence” (2023), the idea that AI tools can solve demanding tasks (agency) without the need of been (human-like) intelligent.
We can envision several ways in which this agency might take shape in the context of machine interpreting. For the sake of brevity, I will focus on two possibilities that are already feasible, at least in principle and with many limitations:
- The ability to ask questions – allowing the AI to seek further context or clarification when something is unclear, rather than producing a potentially inaccurate translation.
- The ability to adjust translations based on the cultural context – modifying or adapting the translation to better fit cultural aspects, ensuring that meaning and intention are preserved and communicated appropriately across languages.
The first example of agency is the machine’s ability to ask questions when something is unclear (read “Large Language Models are often wrong, never in doubt” on the general limitations of LLM in recognizing when they do not know or understand). In a speech translation scenario, this could occur, quite obviously, when the auditory signal is not clear. Let’s consider how today’s AI systems handle this situation: they will typically approximate what was unintelligible by guessing based on the closest phonological match (or similar technique). When the audio is distorted due to noise or poor quality, the machine will still attempt to transcribe (in the case of cascade systems) or translate (in direct systems), even when the confidence level in what was heard is very low. For instance, if an English speaker says a word in Italian with a strong English accent, the language model will likely interpret it as an English word and transcribe it accordingly—errors are inevitable.
Currently, AI systems can already estimate the quality of their acoustic understanding and prevent certain words from being processed if confidence falls below a threshold. However, Agentic AI Interpreters could take this a step further by actively seeking clarification, such as asking the speaker to repeat a phrase. This kind of agency can also be applied in more complex scenarios, involving meaning. Consider situations where a passage lacks cohesion or coherence—i.e., there are semantic issues. Large language models (LLMs) seems to perform quite well at detecting such problems, and can be used to create a feedback loop where the system asks questions to resolve ambiguities. While this isn’t yet implemented, it’s no longer rocket science.
In short, an agentic AI interpreter could decide to pause the translation process to ask for clarification, both to solve acoustic and semantic uncertainties, before proceeding.
The second example involves the ability to provide additional context about cultural implications in what people are saying (see this document for a brief but in depth analysis of cultural elements in translation). This is a common strategy used by interpreters, especially when translating between cultures with significant differences, such as between an American and a Japanese speaker. People engaging in a discussion might not share the same knowledge about certain facts, collective memories, etc. In these cases, interventions in the translation are essential to ‘keep alive’ the communication.
Large Language Models (LLMs) have shown remarkable capability in explaining culturally rich language, particularly for the major languages they have been extensively trained on. They are far from being perfect, obviously, since they miss a true grounding in reality. But it’s not hard to imagine an AI interpreter that – provided some sophistication – can make decisions to clarify or elaborate on certain aspects of the source speech, helping listeners fully grasp the intended meaning. For instance, the AI could choose to provide additional context or explanations in the translation, making explicit what might be hidden or implicit in the original language. This goes beyond literal translation, as the AI would actively interpret the message to ensure it is understood in its proper context by speakers of different cultural background.
Let’s take an example from Italian into English, and how simple prompting techniques, or chain of thoughts, can help to adapt the translation. A person is saying “Quando ti guardo mi ricordi proprio Garibaldi”. The translation obtained with GPT4o with no special prompting, or any current NMT, would be something like:
“When I look at you, you really remind me of Garibaldi.”
Obviously, this doesn’t say very much to a non-Italian target people. But when we ask to adapt it to an English audience, we get – depending on the flavor of the prompting – solutions such as:
“When I look at you, you really remind me of Garibaldi, the Italian national hero.” or, with more reference to the physical world, “”When I look at you, you really remind me of Garibaldi, with his bold, heroic appearance.”
Now, how to make the final choice remains a hard problem, since it depends on the contextual embedding of the communicative event. Without the so-called grounding in the event reality, we miss many pieces of the puzzles to make an informed decision.
If we look at the two very basic examples above, it becomes clear that an agentic AI interpreter will become not just a bridge between languages, but a mediator between meanings, cultural nuances, and linguistic acts. It might be the missing link to adapt the translation dynamically, not only conveying words but also the underlying intentions, tone, and unspoken assumptions. At least this is a theoretical possibility.
This future, where AI interpreters can make these sophisticated decisions, is closer than many might think. However, there are still significant challenges and open questions. One major hurdle is ensuring that these systems have access to the right cultural and contextual information to make accurate judgments. This is not trivial for underrepresented languages. Additionally, the complexity of language—especially in less commonly spoken languages or dialects—means that extensive training and refinement will be required. Moreover, ethical considerations around machine intervention in human communication need careful attention: how much agency should an AI have in altering or explaining a message? Where do we draw the line between helpful clarification and unwanted manipulation of the original speech?
Despite these challenges, the development of agentic AI interpreters will mark an exciting leap forward in the field of automatic interpreting, promising to make communication across languages not only more accurate but more meaningful.
Leave a Reply