Experts from Google have announced the development of a new artificial intelligence system capable of transforming still images into dynamic videos. This model is known as “VLOGGER.”
According to a Google specialist, the mentioned model can create natural video clips of individuals speaking and moving with body language and gestures based on a single still image.
This technology relies on utilizing the latest machine learning models to generate multiple realistic images and merge them to create video clips.
This technology paves the way for a wide range of potential uses, yet it raises concerns about its misuse in deceptive operations or creating videos that mimic advanced falsification techniques.
According to a study issued by Google’s research team, the new version of artificial intelligence models can rely on inputs consisting of a personal image along with an audio recording to generate a video clip depicting the person speaking with the same voice, accompanied by appropriate facial expressions, head movements, and hand gestures.
Although the videos published by Google researchers are not perfect and contain some flaws, being brief with static backgrounds and lacking characters moving within a three-dimensional space, they represent a significant advancement in reviving still images.
The researchers used a modern artificial intelligence technique called dispersion models to achieve outstanding results in image production derived from texts.
This team expanded the scope of using these models to include video creation after being trained on a large amount of data that encompassed over 800,000 diverse characters and 2200 hours of video.
As a result, the VLOGGER model was able to learn how to create videos featuring people of various nationalities, ages, clothing, and positions in unbiased different environments.
The VLOGGER model can be applied in many fields, such as automatic voice translation for video clips by replacing the audio recording, completing missing shots in videos, designing realistic interactive avatars for electronic games and virtual environments, as well as creating automated chat systems capable of optimal interactions with users. However, the risk of misuse of this model remains.