Google DeepMind introduces a new AI tool for creating film music for videos. This tool not only focuses on producing sound but also on video content.
According to DeepMind, by using this tool, users can create scenes by combining elements like “drama points or realistic sound effects or dialogues that match the characters and video points”. Examples of this can be seen on DeepMind’s website, showcasing completely satisfactory audio results.
For example, in regards to online city car videos, Google uses ads like “rocket car, car engine revving, king’s electronic music” to generate sound. The sound of the rocket tires is synced with the car’s movements. Other examples include creating an underwater sound scene using ads like “submersion underwater, marine life, oceans”.
Although users can include ad text, DeepMind considers it optional. Users are also not required to match the created sound in detail to the specific scene. According to DeepMind, the tool can produce an “infinite number” of film music for videos, giving users the ability to create infinite soundtracks.
This technology can stand out alongside other AI tools, such as the sound effects creator from ElevenLabs that uses enhanced text to generate sound. The tool can also facilitate linking sound with videos created by AI tools like Veo and Sora from DeepMind (the latter will represent the combination of audio in the future).
DeepMind states that they train their AI tools by using video clips and audio containing “detailed descriptions of sounds and corresponding dialogue texts”. This allows video producers to match audio events with visual scenes.
This tool still has some limitations. For example, DeepMind is working on improving its ability to synchronize lip movements with dialogue, as shown in the explanatory video. DeepMind also pointed out that the video-to-audio system relies on the video quality, so unclear or distorted videos may lead to a “real drop in audio quality”.
This DeepMind tool is still not publicly available as it still needs “security tests and rigorous testing”. When available, it will have the SynthID watermark from Google to indicate that it was created by artificial intelligence.