How Dall-E 2 and other artistic algorithms work
When the images produced by Google's DeepDream artificial intelligence software appeared in 2015, it immediately felt like we were facing a watershed moment. For the first time (at least in the perception of the mass public), an artificial intelligence was considered as if it were not only a tool, but a real creative entity: an artist.
However controversial ( and often directly rejected), this notion immediately took hold of the public, also giving rise to experiments as dubious from an artistic point of view as profitable from the financial one. This is the case, for example, of the work entitled Portrait of Edmond Belamy, which was sold in 2018 at a Christie's auction for $ 432,500. The painting depicts a nineteenth-century gentleman, perhaps a clergyman. What distinguishes this work from a classic portrait is the lack of most of the facial features, which gives the painting a more contemporary and even vaguely disturbing style. And above all, it is the fact that it was “created” (and signed) by the artificial intelligence employed by the collective known as Obvious.
Portrait of Edmond Belamy, Obvious
A more commercial move , since Obvious specially selected a very homogeneous dataset to train artificial intelligence in the production of a very specific type of frameworks and then chose the result considered best among the hundreds or thousands that the algorithm could have produced. In short, artificial intelligence was more a tool carefully guided by its users than an artist.
Experiments of superior artistic value are those created by authors who have made a more experimental and daring use of artificial intelligence, as is the case with Mario Klingemann or Refik Anadol (recently exhibited also at the MEET in Milan ), whose work Machine Hallucinations shows very well what kind of results can be obtained using tools based on deep learning: "Depending on the data used, the system used by Refik Anadol generates different works, which can be more or less abstract", he explained during the Synapse AI Symposium organized in Milan Federico Perazzi, head of the AI department of Bending Spoons. "An interesting aspect is that very different works are created using the same technique, what changes is the dataset".
Content This content can also be viewed on the site it originates from.
In the case of Refik Anadol it is possible to use the same tool to create very different works, for example by categorizing in different ways the hundreds of thousands of images used for training, perhaps depending on their level of abstractness (the same artist has described in detail his working process in this interview in the MOMA magazine in New York). "The technology used is in fact always StyleGAN, a very popular tool that was also used to create the famous 'faces that do not exist'", explained Perazzi from the Synapse AI Symposium stage.
Over the last few months However, the spotlights have all been captured by new creative tools based on artificial intelligence, capable of generating all kinds of images - from the most realistic to the craziest - based on the text commands provided by users. This is the case of Dall-E 2 (which can be experimented by anyone in the "mini" version) or of MidJourney, algorithms capable of giving life to everything that can come to mind with often surprising results, as is the case of the famous image generated in starting from the command "photo of an astronaut riding a horse".
"A very interesting element is that these algorithms recreate our own process in a certain sense: we first think about something and then we start making sketches to start reproducing what we have in mind ”, explains Federico Perazzi. “By generating images from textual commands, tools such as Dall-E 2 partially reproduce this same work.”
But how do these algorithms work? How does Dall-E 2 know how a text like “teddy bear on a skateboard in New York” manifests itself in the visual space? First of all, it is necessary to train the artificial intelligence models behind Dall-E 2 using hundreds of thousands of images associated with a caption illustrating what is contained in that image, thus teaching the system to combine a description with the image that most likely represents it correctly.
OpenAI
At this point, things get very technical (here you will find a detailed but understandable explanation). In a nutshell, the process consists of three steps: First of all, a text command is sent to a text encoder that has been trained to decrypt it and assign numerical values to it. The second step instead involves the intervention of a model called "prior", which associates the text encoding with a corresponding image encoding, acquiring the semantic information of the text command. Finally, an "image decoder" generates an image that is a visual manifestation of semantic information.
Although the results are often astonishing, this tool still has several critical aspects: "First of all, datasets can contain prejudices within them and that is why many companies are reluctant to freely market their instruments ”, always explains Perazzi. "Imagine for example the misinformation that can potentially cause a system capable, by asking it, to invent images of someone who is in a place they shouldn't be."
Problems can also be of another kind. For example, it was noted that by asking to produce images of a nurse or assistant, Dall-E 2 only created images of women, while asking to create a lawyer or CEO invariably came out of men. In short, our social prejudices end up in the images we produce and which then become the datasets used by the algorithms, which inevitably absorb and reproduce in turn those same prejudices.
There are other aspects to be addressed: for example the risk that these tools overwhelm the creative industry, allowing anyone to create images without the need to involve any artist (think for example of record covers or advertising posters). Of course, there will still be a need for someone to think about what are the best commands to give to the artificial intelligence system and then select the results, a process that - provocatively - could be considered not too different from the way Andy Warhol worked with. his assistants.
“Certainly, it is the human being who instructs the models”, concludes Perazzi. “But a little information is enough, without the need to give details. I believe that tools like Dall-E 2 are partly tools and partly also authors, that these models have a kind of imagination. Sure, it can't create something absolutely new, but we humans can't do it either. In my opinion, there is also a creative element in the art of artificial intelligence ".
However controversial ( and often directly rejected), this notion immediately took hold of the public, also giving rise to experiments as dubious from an artistic point of view as profitable from the financial one. This is the case, for example, of the work entitled Portrait of Edmond Belamy, which was sold in 2018 at a Christie's auction for $ 432,500. The painting depicts a nineteenth-century gentleman, perhaps a clergyman. What distinguishes this work from a classic portrait is the lack of most of the facial features, which gives the painting a more contemporary and even vaguely disturbing style. And above all, it is the fact that it was “created” (and signed) by the artificial intelligence employed by the collective known as Obvious.
Portrait of Edmond Belamy, Obvious
A more commercial move , since Obvious specially selected a very homogeneous dataset to train artificial intelligence in the production of a very specific type of frameworks and then chose the result considered best among the hundreds or thousands that the algorithm could have produced. In short, artificial intelligence was more a tool carefully guided by its users than an artist.
Experiments of superior artistic value are those created by authors who have made a more experimental and daring use of artificial intelligence, as is the case with Mario Klingemann or Refik Anadol (recently exhibited also at the MEET in Milan ), whose work Machine Hallucinations shows very well what kind of results can be obtained using tools based on deep learning: "Depending on the data used, the system used by Refik Anadol generates different works, which can be more or less abstract", he explained during the Synapse AI Symposium organized in Milan Federico Perazzi, head of the AI department of Bending Spoons. "An interesting aspect is that very different works are created using the same technique, what changes is the dataset".
Content This content can also be viewed on the site it originates from.
In the case of Refik Anadol it is possible to use the same tool to create very different works, for example by categorizing in different ways the hundreds of thousands of images used for training, perhaps depending on their level of abstractness (the same artist has described in detail his working process in this interview in the MOMA magazine in New York). "The technology used is in fact always StyleGAN, a very popular tool that was also used to create the famous 'faces that do not exist'", explained Perazzi from the Synapse AI Symposium stage.
Over the last few months However, the spotlights have all been captured by new creative tools based on artificial intelligence, capable of generating all kinds of images - from the most realistic to the craziest - based on the text commands provided by users. This is the case of Dall-E 2 (which can be experimented by anyone in the "mini" version) or of MidJourney, algorithms capable of giving life to everything that can come to mind with often surprising results, as is the case of the famous image generated in starting from the command "photo of an astronaut riding a horse".
"A very interesting element is that these algorithms recreate our own process in a certain sense: we first think about something and then we start making sketches to start reproducing what we have in mind ”, explains Federico Perazzi. “By generating images from textual commands, tools such as Dall-E 2 partially reproduce this same work.”
But how do these algorithms work? How does Dall-E 2 know how a text like “teddy bear on a skateboard in New York” manifests itself in the visual space? First of all, it is necessary to train the artificial intelligence models behind Dall-E 2 using hundreds of thousands of images associated with a caption illustrating what is contained in that image, thus teaching the system to combine a description with the image that most likely represents it correctly.
OpenAI
At this point, things get very technical (here you will find a detailed but understandable explanation). In a nutshell, the process consists of three steps: First of all, a text command is sent to a text encoder that has been trained to decrypt it and assign numerical values to it. The second step instead involves the intervention of a model called "prior", which associates the text encoding with a corresponding image encoding, acquiring the semantic information of the text command. Finally, an "image decoder" generates an image that is a visual manifestation of semantic information.
Although the results are often astonishing, this tool still has several critical aspects: "First of all, datasets can contain prejudices within them and that is why many companies are reluctant to freely market their instruments ”, always explains Perazzi. "Imagine for example the misinformation that can potentially cause a system capable, by asking it, to invent images of someone who is in a place they shouldn't be."
Problems can also be of another kind. For example, it was noted that by asking to produce images of a nurse or assistant, Dall-E 2 only created images of women, while asking to create a lawyer or CEO invariably came out of men. In short, our social prejudices end up in the images we produce and which then become the datasets used by the algorithms, which inevitably absorb and reproduce in turn those same prejudices.
There are other aspects to be addressed: for example the risk that these tools overwhelm the creative industry, allowing anyone to create images without the need to involve any artist (think for example of record covers or advertising posters). Of course, there will still be a need for someone to think about what are the best commands to give to the artificial intelligence system and then select the results, a process that - provocatively - could be considered not too different from the way Andy Warhol worked with. his assistants.
“Certainly, it is the human being who instructs the models”, concludes Perazzi. “But a little information is enough, without the need to give details. I believe that tools like Dall-E 2 are partly tools and partly also authors, that these models have a kind of imagination. Sure, it can't create something absolutely new, but we humans can't do it either. In my opinion, there is also a creative element in the art of artificial intelligence ".