OpenAI Just Broke the Internet. Introduction | by Luís Fernando Torres | May, 2024

MMay 13th, 2024 marks the day OpenAI revealed GPT-4o to the world. This might be the most impressive product to be ever announced by OpenAI thus far. In this article, we’ll explore a bunch of awesome use cases showcased throughout the day on OpenAI’s YouTube channel.

GPT-4o—“o” standing for “omni”—is yet another leap towards a more natural Artificial Intelligence model. You can communicate with this model via any combination of text, audio, and image inputs and get outputs in any of these formats as well. According to OpenAI, it can respond to audio inputs in as little as 232 milliseconds, with an average response time of 320 milliseconds, which is comparable to a human conversation speed. Not only that, but it is also 2x faster and 50% cheaper in the API compared to gpt-4-turbo.

Comparison of text evaluation performance across various models. GPT-4o is superior in most categories, including MMLU, HumanEval, and MGSM.

Let’s dive into the coolest use cases that OpenAI presented us during the day!

Lullabies and Whispers

In this video you can see that, not only voice chats are incredibly faster compared to what we have now, making the conversation much more fluid and natural, but GPT-4o is also great at conveying emotions, laughing and interacting in a more human-like manner with the user.

It is also able to sing and whisper, as well as change the singing pace and the intensity of its whispers according to the user’s request to create a relaxing and immersive experience.

GPT-4o as a Tutor

In this video, you can see how you’re able to share your screen with GPT-4o and it is able to help you track several activities. In this case, the model helps a student go through solving a Math problem by tutoring him and helping him get to the correct result step-by-step.

This can help several students across different cultures and offer them a more personalized learning experience at their own pace. It can also serve as a copilot throughout other activites like video editing and coding.

Realtime Translation

Traveling abroad and connecting with people from different cultures and backgrounds just became easier with ChatGPT!

In the video above, you can see how users are able to leverage the speed of the new voice chat mode to act as a real-time interpreter. The model is now able to hear person A speaking English and immediately translate it into Spanish for person B, which in turn replies in Spanish so that the model can immediately translate the reply into English for person A.

Language Learning

You can now help ChatGPT “see” the world and the environment you’re in. You are able to show it objects on a table, people close to you, lamps and windows, and help the model interact with them. In this video, you can see how this new ability can be used for language learning, where GPT-4o is able to teach you how to pronounce and name different objects that are right next to you in any language you wish to learn, such as Spanish.

ChatGPT Can Now See the World

As said in the video above, GPT-4o is a model that interacts with the world through audio, vision, and text.

It is able to recognize brands on a hoodie, objects in a room, as well as infer what the user might be doing—or about to do—according to what’s going on in the environment. It is also capable of expressing human emotions, such as excitement and curiosity.

ChatGPT Meets a Dog

In this video, the model is able to recognize a dog in front of the user via the iPhone camera and interact with both the user and the dog at the same time. What is most impressive is how the model emulates human emotions and behavior, including the silly voice that a lot of people make when talking to a baby or a puppy.

Be My Eyes: Technology to Improve People’s Lives

And this might be the icing on the cake.

You’ve seen so far how this new model is fast, able to express human-like expressions and emotions, able to recognize the environment around the user, help people solve Math problems, sing, whisper, crack some jokes, etcetera.

However, this last video is by far my favorite. It showcases how GPT-4o’s new ability to see the world through a phone’s camera significantly improves the lives of people struggling with blindness and other forms of vision impairments. The model is able to describe situations and the environment, as well as help people catch a taxi.

OpenAI’s GPT-4o is another step towards advanced AI technology and a wide array of new possibilities. From singing and whispering to providing real-time translation and tutoring, OpenAI’s new model is nothing close to what we have seen thus far when it comes to how it interacts with the user’s environment and engage in human-like conversations.

We are seeing the potential AI models have to profoundly impact individuals with vision impairments, language learners, and much more. A great step towards the future and another transformative tool in the realm of Artificial Intelligence.

What a time to be alive!

Luis Fernando Torres

Let’s connect!🔗

LinkedIn • Kaggle • HuggingFace

Like my content? Feel free to Buy Me a Coffee ☕ !

Recent Articles

Related Stories

Leave A Reply

Please enter your comment!
Please enter your name here