Meta has developed a new set of intelligent models called “Sequential Communication” to enhance natural and real communication across different languages, contributing to the realization of global speech translation.
The main model known as Flow integrates the capabilities of three other models – Expressive Flow, Direct Flow, and M4T v2 Flow – into one unified system.
According to research paper, “Seamless” is the first publicly available system that enables expressive communication between languages in real-time.
Last August, Meta unveiled the new AI-powered translation model, SeamlessM4T, which supports text translation in approximately 100 languages and speech translation in 36 languages.
With the updated v2 architecture, Meta is expanding the use of this tool to make conversation translations informal and expressive, aiming to achieve authentic cross-language conversations.
Seamless Translator aims to integrate three advanced neural network models to provide instant translation between over 100 spoken and written languages, while preserving the speaker’s voice, emotion, and tone.
SeamlessExpressive focuses on maintaining vocal tone and precise emotional expression of the speaker during translation between languages.
As detailed in the document, translations must capture the nuanced differences in human expression, as current translation tools often rely on specific systems to convert text into speech.
The summoned languages include English, Spanish, German, French, Italian, and Chinese.
SeamlessStreaming allows instant translation with a short latency of just under two seconds, making it the first model to offer this high translation speed for over 100 spoken and written languages.
SeamlessStreaming starts translating speech while it is being spoken, enabling others to hear the translation quickly.
The third model, SeamlessM4T v2, sets the groundwork for the other two models, evolving from the original SeamlessM4T model released by the company last year. The new architectural designs provide better integration between text and speech outputs.
Meta stated, “Seamless gives us a comprehensive insight into the core technologies needed to transform the concept of global speech translation from a mere scientific fiction into a real technology in the actual world.”
The availability of models allows for new voice-based communication experiences, such as real-time multilingual conversations using smart glasses, as well as automatic translation of videos and podcasts.
According to Meta researchers, the models may also help overcome language barriers faced by immigrants and individuals struggling with communication difficulties.