Google introduced the Astra Project, an ambitious artificial intelligence agent, which the company says represents the future of AI assistants that interact with the world in the same way that humans do, remembering what they see and hear in order to answer questions related to the environment they are in.
During their annual developer conference, the company presented an explanation of how such an agent works, showing a previously recorded video clip of an employee walking around the office using the AI assistant through the phone camera to see and respond to questions related to the scene.
The AI agent correctly answered a question about the location of the office in which neighborhood in London based on the view from the window, and also informed the employee about where he left his glasses.
This means that the Astra Project deals with real-time visual data, also retaining what it has seen and displaying stored information.
According to the company, the agents are designed to process information quickly through continuous encoding of video frames and combining video and audio signals on a timeline of events and temporarily storing this data for efficient retrieval.
Google said in a post: “We have made significant progress in developing AI systems that can understand information from multiple sources, and although reducing response time for conversational things presents a difficult engineering challenge.”
Google is also working to provide a wide range of artificial intelligence to enhance voice expression, using speech models to improve sound quality and give customers a diverse voice experience.
This type of human-like expression in responses resembles the pauses of Duplex system that led people to believe that Google’s artificial intelligence may be a candidate for Turing testing.
According to Google, some features of the Astra Project are expected to reach Gemini in the third quarter of this year.