Most of the time live communication takes place using both verbal and non-verbal means which are adjusted to the situational needs and communicative objectives of the interlocutors. This obviously plays a crucial role also in multilingual communication and in machine interpretation. For example, typical verbal means can be the so-called topicalization, i.e. positioning the most…
Category: Vision
Facial emotion recognition may improve automatic speech translation
Meta AI recently published a new framework (AV-HuBERT) to improve automatic speech recognition thanks to lips monitoring, de facto combining Speech with Vision, two of the traditional areas of Artificial Intelligence. Incorporating data on both visual lip movement and spoken language, AV-HuBERT aims at bringing artificial assistants closer to human-level speech perception (see META AI…