MMay 13th, 2024 marks the day OpenAI revealed GPT-4o to the world. This might be the most impressive product to be ever announced by OpenAI thus far. In this article, weâll explore a bunch of awesome use cases showcased throughout the day on OpenAIâs YouTube channel.
GPT-4oââoâ standing for âomniââis yet another leap towards a more natural Artificial Intelligence model. You can communicate with this model via any combination of text, audio, and image inputs and get outputs in any of these formats as well. According to OpenAI, it can respond to audio inputs in as little as 232 milliseconds, with an average response time of 320 milliseconds, which is comparable to a human conversation speed. Not only that, but it is also 2x faster and 50% cheaper in the API compared to gpt-4-turbo.
Letâs dive into the coolest use cases that OpenAI presented us during the day!
Lullabies and Whispers
In this video you can see that, not only voice chats are incredibly faster compared to what we have now, making the conversation much more fluid and natural, but GPT-4o is also great at conveying emotions, laughing and interacting in a more human-like manner with the user.
It is also able to sing and whisper, as well as change the singing pace and the intensity of its whispers according to the userâs request to create a relaxing and immersive experience.
GPT-4o as a Tutor
In this video, you can see how youâre able to share your screen with GPT-4o and it is able to help you track several activities. In this case, the model helps a student go through solving a Math problem by tutoring him and helping him get to the correct result step-by-step.
This can help several students across different cultures and offer them a more personalized learning experience at their own pace. It can also serve as a copilot throughout other activites like video editing and coding.
Realtime Translation
Traveling abroad and connecting with people from different cultures and backgrounds just became easier with ChatGPT!
In the video above, you can see how users are able to leverage the speed of the new voice chat mode to act as a real-time interpreter. The model is now able to hear person A speaking English and immediately translate it into Spanish for person B, which in turn replies in Spanish so that the model can immediately translate the reply into English for person A.
Language Learning
You can now help ChatGPT âseeâ the world and the environment youâre in. You are able to show it objects on a table, people close to you, lamps and windows, and help the model interact with them. In this video, you can see how this new ability can be used for language learning, where GPT-4o is able to teach you how to pronounce and name different objects that are right next to you in any language you wish to learn, such as Spanish.
ChatGPT Can Now See the World
As said in the video above, GPT-4o is a model that interacts with the world through audio, vision, and text.
It is able to recognize brands on a hoodie, objects in a room, as well as infer what the user might be doingâor about to doâaccording to whatâs going on in the environment. It is also capable of expressing human emotions, such as excitement and curiosity.
ChatGPT Meets a Dog
In this video, the model is able to recognize a dog in front of the user via the iPhone camera and interact with both the user and the dog at the same time. What is most impressive is how the model emulates human emotions and behavior, including the silly voice that a lot of people make when talking to a baby or a puppy.
Be My Eyes: Technology to Improve Peopleâs Lives
And this might be the icing on the cake.
Youâve seen so far how this new model is fast, able to express human-like expressions and emotions, able to recognize the environment around the user, help people solve Math problems, sing, whisper, crack some jokes, etcetera.
However, this last video is by far my favorite. It showcases how GPT-4oâs new ability to see the world through a phoneâs camera significantly improves the lives of people struggling with blindness and other forms of vision impairments. The model is able to describe situations and the environment, as well as help people catch a taxi.
OpenAIâs GPT-4o is another step towards advanced AI technology and a wide array of new possibilities. From singing and whispering to providing real-time translation and tutoring, OpenAIâs new model is nothing close to what we have seen thus far when it comes to how it interacts with the userâs environment and engage in human-like conversations.
We are seeing the potential AI models have to profoundly impact individuals with vision impairments, language learners, and much more. A great step towards the future and another transformative tool in the realm of Artificial Intelligence.
What a time to be alive!
Luis Fernando Torres
Letâs connect!ð
LinkedIn ⢠Kaggle ⢠HuggingFace
Like my content? Feel free to Buy Me a Coffee â !