OpenAI Reveals GPT-4o: A Model that Speaks and Understands Like a Human

He wrote it Rafi Barazi 14 May، 2024

Written By Rafi Barazi 14 May، 2024 0 Comments

OpenAI تكشف النقاب عن GPT-4o: نموذج يتكلم ويفهم كالإنسان

The unveiling of the new GPT-4o model by OpenAI marks a significant milestone in artificial intelligence, with the company claiming it represents a step closer to natural interaction between humans and computers.

This new model can take any combination of texts, audio, and images as inputs and generate outputs in various formats.

Moreover, it has the ability to understand emotions, analyze facial expressions, pause mid-sentence, translate spoken language in real-time, and respond almost humanly fast during conversations.

Mira Morati, the Chief Technology Officer at OpenAI, stated during a presentation, “The standout feature of GPT-4o is that it provides the level of intelligence of GPT-4 to everyone, including our free users. This is the first time we are taking a big step forward in terms of user-friendliness.”

In a presentation, OpenAI showcased GPT-4o’s ability to directly translate between English and Italian languages, helping a researcher solve a linear equation in real-time and providing guidance on deep breathing to another executive at the company by analyzing his breath.

Say hello to GPT-4o, our new flagship model which can reason across audio, vision, and text in real time: https://t.co/MYHZB79UqN

Text and image input rolling out today in API and ChatGPT with voice and video in the coming weeks. pic.twitter.com/uuthKZyzYx
— OpenAI (@OpenAI) May 13, 2024

Engineers at OpenAI and the Chief Technology Officer connected over the phone to showcase the new capabilities. They encouraged the assistant to enhance expression while writing a bedtime story, then suddenly asked to transform his voice into a robotic one and later requested to conclude the story with a singing voice.

Later on, they asked the assistant to watch a recording on the phone’s camera and respond to what appeared on the screen. The assistant was also able to speak and respond seamlessly while acting as a translator.

These features represent a significant advancement in the current audio setting in ChatGPT, where users can interact with the system through chat, and although the interaction is limited, the current version cannot be interrupted or respond on camera.

The letter o in GPT-4o stands for omni, highlighting the model’s multimedia capabilities.

OpenAI stated that they trained GPT-4o across text, vision, and audio, meaning the neural network handles all incoming and outgoing signals.

This differs from the company’s previous models, GPT-3.5 and GPT-4, which allowed users to ask questions verbally and convert speech into text, leading to a detachment of tone and emotions and making interactions slow.

Rafi Barazi

Rafi Barazi, founder of the Bawaba AI Portal website, a graduate of the Faculty of Media, Department of Electronic Media, passionate about artificial intelligence and its role in the field of media.

Partnerships

The Bawaba AI platform works with tools supported by Microsoft under the Startup Support Program.

OpenAI Reveals GPT-4o: A Model that Speaks and Understands Like a Human

Artificial Intelligence in Respiratory Disease Protection in Dubai

Anthropic launches chatbot Claude in Europe

Related Posts Custom Text

Leave a Comment Cancel Reply

Partnerships

The Bawaba AI platform works with tools supported by Microsoft under the Startup Support Program.