fbpx
gpt 4 stock photo

Credit: Calvin Wankhede / Android Authority
  • Google released a hands-on video demonstrating the voice response capabilities of Gemini in “real-time.”
  • Google later admitted that the video demo didn’t actually happen in real-time with spoken prompts.
  • A YouTuber used GPT-4 Vision to recreate the Gemini demo and do it in real time.

After Google released its impressive Gemini hands-on demo video, it was discovered to be a little too good to be true. But now someone has recreated that demo in GPT-4 Vision, accomplishing what Gemini couldn’t do in its video.

Google’s Gemini large language model (LLM) is the company’s most powerful suite of AI models to date, and its biggest shot at OpenAI’s GPT-4 architecture. In an attempt to show off just how capable its multimodal LLM is, Google released a hands-on video of Gemini supposedly responding to voice prompts in real time. Initially, the demo was pretty impressive, but viewers eventually discovered a disclaimer that said latency was reduced and Gemini’s outputs were shortened for brevity.