In May, when OpenAI first presented an advanced, almost real-time, and highly realistic audio model for its ChatGPT smart platform, the company announced that this feature would be available to users subscribing to ChatGPT service in a few weeks.
After several months, OpenAI now says it needs more time.
In a post on OpenAI’s official Discord server, the company stated that they had planned to start rolling out the advanced audio mode in the trial version to a small group of ChatGPT Plus users by late June, but due to some lingering issues, they had to postpone the launch to sometime in July.
OpenAI wrote: “For example, we are working on improving the model’s ability to identify and reject certain types of content. We are also enhancing user experience and preparing our infrastructure for expansion to include millions of users while maintaining instant responses. As part of our gradual deployment strategy, we will begin the initial version to a small group of users to gather feedback and expand based on what we learn.”
According to OpenAI, the advanced audio mode may not be launched for all ChatGPT Plus customers until the fall, depending on it passing some internal safety and reliability checks. However, this delay will not affect the introduction of new features for video and screen sharing that were showcased separately during OpenAI’s spring press event.
These capabilities include solving math problems using an image of the question and explaining different settings options on the device. They are designed to work across ChatGPT on smartphones as well as clients on desktop computers, such as the macOS app, which became available to all ChatGPT users earlier today.
OpenAI wrote: “The advanced audio mode in ChatGPT can understand emotional responses and non-verbal cues, bringing us closer to natural real-time conversations with artificial intelligence.” And added: “Our mission is to deliver these new experiences to you carefully.”
On stage during the launch event, OpenAI employees demonstrated how ChatGPT responds almost instantly to requests such as solving a math problem on a paper placed in front of a smartphone camera for one of the researchers.
OpenAI’s advanced audio mode sparked a major controversy due to the clear similarity between the virtual voice named “Sky” and the voice of actress Scarlett Johansson. Johansson later issued a statement saying she had hired legal counsel to investigate the matter and get detailed information on how this voice was developed, indicating that she had turned down several requests from OpenAI to license her voice for use in ChatGPT.
OpenAI denied using Johansson’s voice without permission or relying on a similar voice, but they later removed the disputed voice.