Text-to-Image generation with Stable Diffusion

Text-to-image AI generation is a technology that uses artificial intelligence to generate images from textual descriptions. It is one of the hot topics out there where we are seeing a lot of tools popping up to carve out a place for themselves. One such tool that we came across was Stable Diffusion. So, in this article we decided to experiment with the tool and share our experience with it!

Stable Diffusion
Stable Diffusion

What is Stable Diffusion ?

Stable Diffusion is an open source tool and freely available to use. It is based on a deep learning, text-to-image model released in 2022. It is created by the researchers and engineers from CompVis, Stability AI and LAION. It has been trained on 512×512 images from a subset of the LAION-5B database. LAION-5B, for your information, is the largest, freely accessible multi-modal dataset database!

How to generate images with Stable Diffusion?

The user Interface provided by Stable Diffusion is very simple, and only the prompt option is available to describe the image. So let’s throw some prompts at Stable Diffusion and see how well it converts those prompts into images.

Taking a prompt as “a boy playing football on the mars”. Stable Diffusion generates the below images for the above prompt.

boy-playing-football-on-mars
Boy playing football on mars

The above images are the result by Stable Diffusion. Here we can see 4 images of 1:1 (768×768) ratio. The background in the images look realistic in two images while the other two images are difficult to be recognized by the prompt. We observe that it takes the objects like boy, football, Mars from the prompt, and generates four different images with similar meanings. The second and the last images are a good fit for Mars planet as it shows the Moon in the atmosphere. Whereas the first image shows the boy wearing an astronaut suite which shows boy playing on another planet. While other three images shows the red soil of the Mars planet.

Here is another prompt we threw at Stable Diffusion. This time we took a realistic prompt like: Himalayans mountains. Here is the output that is generated.

Himalayans Mountains

The above results shows literal images as the prompt was so realistic. The quality of images are of high definition. The above images portray the prompt very well. The realistic images are generated with great precision by this model.

Stable Diffusion can handle Image-to-Image Generation

Another way to use Stable Diffusion is to use it to regenerate an existing image. You can give a path of any image in the prompt and it will regenerate the image provided in the path.

We gave the following image path in the prompt: https://image.shutterstock.com/image-photo/super-moon-colorful-sky-cloud-260nw-1044034966.jpg. We wanted to see how Stable Diffusion generates an image similar to it.

The following are the images which Stable Diffusion generated based on the above base image.

Stable Diffusion took the basic elements from image like the moon and the clouds and generated similar images. However, it quiet evidently missed to include the water in the image which was quite prominent in the original image. So, although Stable Diffusion can read the image and regenerate the original image into new images, there is still room for improvement here.

How is Stable Diffusion’s Prompt Generator?

Stable Diffusion also gives us a prompt generator, for those who find difficulty in generating precise prompts. So, you can write a short text, and this feature will generate a prettified text using which it will generate the images. This sounds very interesting. Let’s try this out. You can head over to the prompt generator and try out this feature.

We had a short prompt “cozy coffee shop on a rainy day” and it generated a prettified text prompt. There are many detailed prompts in this section which this prompt generator created. Notice that the rain in the prompt is not justified in the pictures, which shows the lack of precision in this model.

Limitations of Stable Diffusion

Although the generative tools are exciting and pretty helpful. But no tool is perfect. We find some challenges while working with Stable Diffusion.

  1. Speed of Image generation: The image generation process takes longer time than expected. It took more than a minute to process the prompt and generate the images.
  2. Human limbs generation: Challenge while generating human limbs due to poor data quality of limbs in the LAION database. The below image is a perfect example that shows how poorly Stable Diffusion generates poor images with humans figures.

3. Image Aspect Image: Stable Diffusion is limited to factorizable aspect ratios. It can generate only square images, not fit for other aspect ratios.

4. Lack of Precision: The images build by this model fails to reflect the essence of the text, which shows lack of precision while converting text-to-image.

Conclusion

We can conclude that Stable Diffusion is a creative tool, which can be helpful for creating images from the scratch, but it can not be used where we need to generate some descriptive, detailed and precise images. The features provided by Stable Diffusion makes it stand in the market, but the poor quality pulls down its capabilities.

These are our opinion on Stable Diffusion model, opinions can differ easily and we would love to hear yours in the comments below. This is the new era of AI and Machine learning, so stay tuned to dig in the AI world with us!

One comment

  1. Can you be more specific about the content of your article? After reading it, I still have some doubts. Hope you can help me.

Leave a Reply

Your email address will not be published. Required fields are marked *