- Whisk generates an AI image by combining subject, scene, and style visual inputs.
- It uses Gemini and Imagen 3 to reinterpret the uploaded images.
- You can tweak the underlying prompts to refine the final output.
AI image generators are a modern marvel, but you can’t always find the right words to describe your creative vision. Google has introduced Whisk for just such occasions. This new experimental tool from Google Labs skips the traditional generative text-based AI approach and allows users to upload images for the subject, scene, and style to create unique results.
Unveiling Whisk in a Labs blog post, Google explains how it works: Once you’ve uploaded two or three images, they’re analyzed through Gemini, which generates detailed captions describing the key characteristics of the inputs. In that sense, you’re just getting Whisk to describe the images for you. These captions are then processed by Imagen 3, Google’s latest image generation model, to generate a new image that blends the provided subject, scene, and style.