ChatGPT with Vision Surpasses Google Bard: Exploring the Future of AI-Powered Conversations

1 Shares

Artificial Intelligence is making big advancements in various industries. OpenAI, based in San Francisco, has been at the forefront since it introduced ChatGPT in November 2022. This led to a competition among tech giants to incorporate the best AI capabilities into their products and services. OpenAI has maintained its lead by continuously improving and enhancing ChatGPT.

On September 25, OpenAI, led by Sam Altman, announced a new feature for its chatbot: voice and image capabilities. This means you can now have voice conversations with the chatbot and share pictures with it. It’s the first time OpenAI has added these features.

The company explained, “Voice and image capabilities give you more ways to use ChatGPT in your Daily life. For example, you can take a picture of a famous landmark while traveling and have a live conversation about it. At home, you can snap pictures of your fridge and pantry to figure out what ingredients you have for dinner, and even ask for a step-by-step recipe. After dinner, you can use it to help your child with a math problem by taking a photo of the problem, circling it, and getting hints for both of you.” These features are available to ChatGPT Plus and Enterprise users, and voice features work on iOS and Android devices.

In July, Google introduced a new feature in its chatbot, Google Bard, in an effort to stay ahead of competitors like OpenAI and Anthropic, which is backed by Microsoft. Google Bard’s updates included the ability to analyze images, provide different styles of responses, support more languages, and more. However, with ChatGPT Vision, OpenAI has once again shown that it’s a leader in AI innovation. The excitement surrounding ChatGPT’s new features is similar to the excitement generated when it was first introduced to the public in November 2022.

Table of Contents

What makes it so important ?

ChatGPT with vision isn’t available to everyone yet, but those who have it are doing some really impressive things with this new feature. These new abilities are making it one of the most exciting AI product announcements we’ve seen in a while. People are finding many different ways to use this new tool, and there are plenty of practical uses to explore once it becomes available to more people.

Exploring Visual Research with ChatGPT Vision

An AI enthusiast named Rowan Cheung posted a picture of a cave on ChatGPT and asked where it was. ChatGPT gave an accurate response, saying it seemed to be from inside a cave overlooking a coastline with a winding road. Based on the scenery, it looked like it might be Makapu’u Point in Hawaii.

Cheung was impressed and tweeted that ChatGPT’s image recognition can uncover hidden gems. Other users on Twitter have also shared similar examples, like asking for locations or identifying animals in pictures. So far, ChatGPT Vision appears to be doing a good job at these tasks.

ChatGPT image recognition can find hidden gems. pic.twitter.com/9GMKgIT5p0
— Rowan Cheung (@rowancheung) September 28, 2023

Now that this feature is on mobile devices, many people are likely to use it. In the future, it could become a common way to learn about things when you’re traveling. Just imagine, you see something interesting, and you can point your phone at it, ask ChatGPT what it is, and get information about it.

Interior Design with ChatGPT Vision

Another AI expert, Pietro Schirano, has been experimenting with ChatGPT Vision. In one of his tweets, Schirano shared a photo of his room and asked how he could make it better. ChatGPT gave several suggestions, including improving the colors, adding plants, adjusting the lighting, and incorporating artwork, among other ideas.

Custom instructions are a feature that allows users to tell ChatGPT more about themselves. This helps ChatGPT understand context when answering future questions. This is clear from the bot’s response, especially when it suggests adding artwork to the room. It says, “Considering your background in classical studies and art, maybe adding some artwork on the walls could be a nice personal touch. You could use prints of classical artworks or something modern to blend the old with the new.”

GPT-4 vision for interior design. 🏠

I love how it's incorporating what it knows about me in the suggestion because of custom instructions.

Really incredible technology. pic.twitter.com/aAFI5ZgPLW
— Pietro Schirano (@skirano) September 28, 2023

ChatGPT Vision in the role of a skilled developer

In his demonstration, Pietro also showed how ChatGPT Vision can create websites and write code. He made code from an image and turned it into a live website using GPT-4 Vision in less than a minute. Basically, he shared a video where he showed an example of a user interface in a picture and asked ChatGPT to recreate it without missing anything. The bot then generated the code, which he could quickly export and use in a development environment.

Another user, McKay Wrigley, did something similar. He provided a screenshot of a software dashboard and asked the bot to generate code for it. ChatGPT Vision transformed the screenshot into a working prototype in just a few minutes. Wrigley also demonstrated that by showing ChatGPT a picture of a team’s whiteboard session, it could be prompted to generate code. This video got nearly 10 million views.

You can give ChatGPT a picture of your team’s whiteboarding session and have it write the code for you.

This is absolutely insane. pic.twitter.com/bGWT5bU8MK
— Mckay Wrigley (@mckaywrigley) September 27, 2023

Minimizing the separation between thoughts and execution

ChatGPT Vision can do something truly amazing—it can read and explain complex diagrams.

For example, one user, Sean Spriggens, shared an incredibly detailed diagram from the Pentagon. This diagram had over 3,000 words and hundreds of boxes, but ChatGPT was able to understand it. What’s interesting is that ChatGPT can handle different types of diagrams, not just military ones.

ChatGPT image recognition vs "Crazy Pentagon PowerPoint Slides:"

(h/t @jonst0kes 🫡) pic.twitter.com/MX3NhTpG1n
— Sean Spriggens (@seanspriggens) September 26, 2023

Another user, Marco Moscorro, posted an electronics diagram for an Arduino design. ChatGPT Vision instantly recognized it as an electronic circuit and explained how the different parts were connected and worked together.

This is also a fantastic tool for education. Users can ask ChatGPT for more information about the diagrams they’re exploring. It’s like having a conversation with a machine. However, there’s a downside to this. Some people are using it to get answers to homework questions, which has led to concerns that students might stop doing their homework.

But experts believe that if teachers give students exercises that ChatGPT can’t do, it can make education more valuable. So, while ChatGPT is impressive, there are still important tasks that humans can do better.

Additional Information About the New Features of ChatGPT

The new voice and image features make ChatGPT easier to use. Now, you can talk to the chatbot using your voice or show it pictures, which is pretty cool. These features are a big deal in the world of AI because they can be helpful in everyday conversations. For example, you can discuss places to visit or get dinner suggestions based on what’s in your kitchen. Plus, the text-to-speech feature sounds really human-like.

When it comes to accuracy, the web browsing feature isn’t always perfect. But ChatGPT with vision has shown its value in real-life situations. Recent research even showed that it can identify problems in manufacturing, create medical scan reports, and assess vehicle damage, among other things. So, despite occasional mistakes, GPT-4 with vision is a big step forward in having an AI assistant that understands pictures. You should give the vision features a try using Bing Chat and GPT-4 to make your tasks easier.

OpenAI is being careful as it rolls out these features. They’re making sure they’re safe and don’t cause any problems. The vision-based models have gone through a lot of testing. And OpenAI is also working with others, like ‘Be My Eyes,’ to make these features useful for visually impaired people. OpenAI is being open about any mistakes or problems, especially when it comes to images with people. They say they’re taking steps to protect your privacy.

Post Views: 157

Pin1

1 Shares

Latest Post

The U.S. Army Conducts Trials of AI-Enhanced Autonomous Combat Vehicle

How to Create an Effective Digital Marketing Strategy with AI