Microsoft has announced a new artificial intelligence model capable of creating realistic videos of human characters speaking, called “Vasa 1.”
The company confirms that the generated videos contain lip movements synchronized with the audio, as well as facial expressions and head movements, to make them appear natural.
Microsoft does not intend to launch a product or an API that exploits the “Vasa 1” model due to the risk of deepfake generation from this technology.
Microsoft has modified the operation of the artificial intelligence model and highlighted its capabilities. The company states that the model is capable of producing videos in 512×512 pixel resolution at speeds of up to 40 frames per second.
The artificial intelligence model supports online video production with minimal latency. Vasa 1 provides up to one minute of high-quality videos using a single still image.
In this company, the focus has been on its ability to produce lip movements that support the audio file and matching facial expressions.
The video generation model using artificial intelligence technology provides precise control to the user in various aspects of the video, such as gaze direction, head distance, and other elements.
These factors help in controlling the three-dimensional head position and facial dynamics, making it easier to adjust the output according to user guidelines.
The artificial intelligence model can create videos using artistic images, singing audio, and speech in languages other than English.
Microsoft has emphasized that despite acknowledging the potential misuse, we must recognize the significant benefits that come with our technology, such as promoting equality in education and improving accessibility for individuals facing communication challenges and providing care for those in need. We are committed to responsibly developing artificial intelligence to enhance human well-being.