AI Art Generator

Artificial Intelligence (AI) has been used to create art since the 1950s when computer algorithms were developed to generate visual patterns. AARON is one of the first significant AI art systems developed in the late 1960s. It is a notable example of AI in the era of “Good Old Fashioned Artificial Intelligence” (GOFAI) programming as it used a symbolic rule-based approach to generate images. This technology has come a long way since then!!

In recent years, AI art generators have become increasingly popular among artists as they provide a level of creativity and control that would otherwise be impossible. Tools like Midjourney, BlueWillow, etc have particularly revolutionized this field.

These tools are designed to learn specific aesthetics by analyzing millions of images. The algorithms behind these tools try to generate new images by the aesthetics it has learned.

What is an AI Art Generator?

AI art generators are computer programs designed to create art from data inputs, such as photos, videos, or even texts! They use a variety of algorithms, such as Generative Adversarial Networks (GANs), Latent Diffusion, and Variational Autoencoders (VAEs) to generate art.

How Does an AI Art Generator Work?

Extensive research has been done in the field of art generation and a range of algorithms and their variations have been designed to achieve state-of-the-art results.

Generative Adversarial Networks: A novel generative architecture AICAN uses GANs to generate art. In their words, their model works on the idea that: “The process simulates how artists digest prior artworks until, at some point, they break out of established styles and create new styles”.

AI is trained between two opposing forces, one that urges the machine to follow the aesthetics of the art that is given as an input (minimizing deviation) and the other that penalizes the machine if it copies an already established style (maximizing style ambiguity). These two opposing forces ensure that the generated art is novel, but at the same time doesn’t deviate too much from given aesthetic features.

The figure below shows the design of the CAN network:

Latent Diffusion Model: Stable diffusion, a popular text-to-image generation tool uses a latent diffusion model.

(https://www.marktechpost.com/2022/11/14/how-do-dall%C2%B7e-2-stable-diffusion-and-midjourney-work/)

Diffusion Models take an input (e.g. an image) and sequentially add noise to it till the image is unrecognizable. From this unrecognizable image, they try to regenerate the original image and in the process learn how to generate pictures (or other forms of data).

Applying such diffusion models is extremely infrastructure intensive, as such, these models are applied to a latent space of pre-trained encoders. Training diffusion models on such latent space ensures an optimal point between reduction in complexity and preservation of details. Further, adding a cross-attention layer to the architecture turns it into a powerful generator for general inputs.

It is speculated that even Midjourney uses a variation of Latent Diffusion Models.

Variational Autoencoders:

https://towardsdatascience.com/understanding-variational-autoencoders-vaes-f70510919f73

A variational autoencoder is a special kind of autoencoder whose training process is regularized so that latent space has “good” properties to provide generative capabilities to the decoder.

Instead of encoding input as a single point, it is encoded as a distribution over a latent space.

The encoded distribution is chosen to be normal so that the encoder can be trained to return the mean and covariance matrix. The model requires the latent space distribution to be “complete” and “continuous”, properties that can be met through the assumption of normality The training process of the model is as follows:

  1. Input is encoded as a normal distribution over the latent space

  2. A point from the latent space is sampled from that distribution

  3. The sampled point is decoded and the reconstruction error can be computed

  4. The reconstruction error is backpropagated through the network

DALL·E 2 vs Stable Diffusion vs Midjourney

While exploring these three options, we noticed that the results of Mdijourney were consistently rich in color and artistic in style. Results of DALL.E2 varied widely across prompts, ranging from really good to average at best.
Stable Diffusion gave realistic results for human images, often lagged in other kinds of images. It must be noted that stable diffusion has a large community support because of its open-source nature; as such, considerable improvements can happen in it in months to come.

Applications of AI Art Generators

AI art generators can be of use in a range of fields like art, photography, and filmmaking. One particularly interesting field in which AI art generation tools like mid-journey can contribute is architecture.

As per a study, the images created using Midjourney portray buildings that are not only visually appealing but are technically correct in aspects not covered by the input prompt. However, its capabilities in autonomous decision-making are limited. Ultimately it’s the responsibility of the user to provide relevant inputs and judge the quality of the final output. A generated image represents an initial creation stage upon which the architect can add refinements. As for copyrights, the creators of Midjourney attribute full rights to the images to the user.

Conclusion

The world of AI-generated art is constantly expanding, and with the emergence of cutting-edge models and software, it’s evolving at an unprecedented rate. From entertainment and architecture to photography and beyond, the potential for AI art generation to revolutionize various fields is truly remarkable.

As we look toward the future, the possibilities for AI-generated art are endless. The ever-evolving nature of this technology promises to bring forth even more groundbreaking applications, leaving us in awe of the endless creative potential that AI can unleash.

Midjourney at its creative best:

Comments are closed, but trackbacks and pingbacks are open.