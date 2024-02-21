Loading... Loading...

Earlier this month, Alphabet Inc.'s latest "experimental" AI model, Google Gemini 1.5 Pro, was released to select users like developers and enterprise customers via the company's GenAI dev tool AI Studio. The latest iteration of this model can go far beyond its previous versions and do more than process 100,000 words at once.

What Happened: Over the weekend, Rowan Cheung, the founder of The Rundown AI, took to X, formerly Twitter, and shared a series of posts explaining six "impressive capabilities" he found that users can try with Gemini 1.5 Pro.

First: Understanding Long Videos

Cheung uploaded "the entire NBA dunk contest" from Saturday night and prompted Gemini 1.5 Pro to answer "which dunk had the highest score."

According to the Rundown AI founder, the model found "the specifical perfect 50 dunk and details from just its long context video understanding."

Second: Understanding Full Movie Transcripts

The AI maestro then asked Gemini 1.5 Pro to "compare and contrast" movie transcripts of "Interstellar" and "Ad Astra" to help him decide which one he should watch next.

Google's latest AI model "was able to understand, compare, and contrast entire transcripts from both movies" easily. All he had to do was upload the transcripts and give the right prompt.

Third: Passing The Language Barrier

Cheung successfully translated a language into one spoken by fewer than 2000 individuals via Gemini 1.5 Pro. He translated a newsletter from English into Saterlandic, "following a full linguistic manual at inference time."

Fourth: Distinguishing Fake And Real Videos

Cheung also asked Google's Gemini 1.5 Pro to discern whether a video by OpenAI's Sora was produced by AI or not. All he had to do was upload the video and prompt, "Could this video be AI generated?"

While Gemini 1.5 Pro did not give an accurate answer, it did highlight "key factors of why it could be AI-generated."

Fifth: Making It Easier To Understand Nuances From Long Documents

Cheung used the AI model to extract “Table 8” from the Gemini 1.5 Pro paper authored by DeepMind. Gemini 1.5 Pro could find, understand, and explain a "small figure in a long paper."

Sixth: Getting A Personalized Review Of A Movie

The Rundown AI chief also used Gemini 1.5 Pro to get a personalized review of Christopher Nolan's "Interstellar." He uploaded the movie's transcript and asked the AI model to extract the three most significant quotes from it – which the model did brilliantly.

Why It's Important: Although not yet accessible to the public, Cheung was granted early access to Gemini 1.5 Pro by Google DeepMind.

This advanced model boasts a significantly greater data processing capacity compared to its predecessor, Gemini 1.0 Pro. It can handle approximately 700,000 words or 30,000 lines of code, marking a notable increase of 35 times over Gemini 1.0 Pro’s capabilities.

Moreover, the model’s functionality extends beyond text processing. It can manage up to 11 hours of audio or an hour of video in multiple languages.

This expanded capability is made possible by Gemini 1.5 Pro’s support for up to one million tokens, with Google reporting successful tests with up to 10 million tokens.

However, it’s worth noting that the current version of Gemini 1.5 Pro, accessible to most developers and customers, is limited to processing around 100,000 words at a time.

