After Open AI’s DALL-E 2 or, in another style, Microsoft’s XiaoIce, a couple of text-images are currently in the spotlight, served by rather surprising artificial intelligence (AI) algorithms. This is the case of Imagen, a new Google project that creates images from descriptive texts …
Do you know Imagen? This is a Google R&D project that allows, based on a description that takes into account a number of terms, to create images representative of this source of information.
Here is what is explained on the official website: Imagen is ” a model of broadcasting text into an image with an unprecedented degree of photorealism and a deep level of language comprehension. Imagen uses the power of large, transformative language models to understand text and the power of diffusion models to generate high-fidelity images. Our main finding is that large generic language models (e.g., T5), pre-trained on text-only corpora, are surprisingly effective in encoding text for image synthesis: increasing the size of the language in Imagen improves both pattern fidelity and image and text alignment much more. from increasing the image diffusion sample size. Imagen achieves a new highest FID score of 7.27 on the COCO dataset, without any COCO training, and human evaluators believe Imagen samples are equivalent to COCO data alone in terms of image and text alignment. To evaluate text image models in more detail, we present DrawBench, a comprehensive and interesting benchmark for text image models. Using DrawBench, we compare Imagen with recent methods including VQ-GAN + CLIP, Latent Diffusion and DALL-E 2 models, and find that people who evaluate people prefer Imagen over other models in comparative comparisons, both in terms of sample quality rather than image alignment. and text. »
A system that is still very basic and not very usable
For now, the system is quite simple and only allows you to create images that meet certain criteria selected from a predefined list. Here are some examples in the pictures, and below the text that made them possible:
A magnificent oil painting of a raccoon queen wearing a red French royal dress. The picture hangs on an ornate wall decorated with wallpaper. Source: Imagen
Marble koala statue DJ in front of a marble gramophone statue. The koala wears large marble headphones. Source: Imagen
Bucket bag of blue suede. The bag is decorated with intricate gold paisley patterns. The handle of the bag is made of rubies and pearls. Source: Imagen
A huge cobra snake on a farm. The snake is made of corn. Source: Imagen
Do you see the concept? Well, of course, these are deliberately misleading examples, because it is a safe bet that you will very rarely need a picture like this in real life … 🙂
What is more interesting is to imagine what is possible later, in terms of illustrations (especially in the field of animation and advertising, for example, but not only) when these algorithms will be developed and can be used in full size.
Maybe SEO will even be able to intervene and try to understand how certain images were created, in order to try to position themselves with the same initial text. Some kind of “reverse engineering” in metaverse mode? Who knows what the evolution of SEO will be like in the years to come? Tools to follow anyway, for their promises as well as for possible overflows they can generate …