The AI that draws what you want seems a little sexist
It was January 2021 when OpenAI, the artificial intelligence research company founded by Elon Musk and now funded by Microsoft, presented Dall-E to the world. Obtained from the Gpt-3 algorithm (the one that became famous for being able to write real essays) and so named in honor of Wall-E (of Pixar) and Salvador Dalì, this artificial intelligence system has proven to be able to draw images correctly based only on their textual description.
Dall-E was in fact trained through hundreds of thousands of text-caption pairs fished on the internet. By dint of associating them, he has learned to independently derive images from texts alone. For example, asking the algorithm to create a cartoon image of a penguin wearing a blue hat, red gloves, a green t-shirt and yellow pants the result is what you see below.
Image courtesy of OpenAI
A little over a year later, OpenAI revealed the next version: Dall-E 2, an artificial intelligence capable of generating images always starting from the text, but in a much faster way, making them almost indistinguishable from real photographs and also providing the ability to edit them. For example, we can ask Dall-E 2 to create the image of an astronaut on horseback, or of a "fox sitting in a field at sunset in the style of Claude Monet".
Image courtesy of OpenAI
At this point, the image can be edited in an equally simple way: just tell the artificial intelligence to replace the astronaut with a basketball player or the fox with an elephant. But how does such a system work with such precision? “A neural network learns its skills by analyzing huge amounts of data. By identifying the patterns present in thousands of images, for example, of bananas, he learns to recognize them - explains the New York Times -. Dall-E instead searches for patterns through the analysis of millions of digital images and related captions, which describe what is portrayed in each image. In this way, he learns to recognize the connections between images and words ".
Unlike its previous incarnation (which, as mentioned, was based on Gpt-3), Dall-E 2 exploits a network neural called Clip, which on the basis of the text commands received begins to outline a few characteristic features of the requested image. In the case of an astronaut, for example, Clip could limit himself to graphically hinting at the typical space helmet. At this point, a second neural network (to be precise a diffusion model) intervenes, which starting from the salient features of an image generates the pixels that it expects to be necessary to complete it correctly.
Opportunities and risks The Dall-E 2's capabilities - according to Alex Nichol, one of the project managers - could be the basis of commercial products used for work by artists, designers and video game developers. Widening the gaze, it is possible to imagine that a system of this kind could represent another step in the direction of general artificial intelligence: an algorithm therefore endowed with capacity and flexibility at least equal to those of the human brain.
Differently of other artificial intelligence systems, specialized exclusively in recognizing images, in creating them or in dialoguing through the written form, Dall-E 2 (and, to a lesser extent, also its previous version) possesses the amazing ability to combine the textual aspect and that linked to images, and in some cases also to understand the relationship between the two. Although there is always a statistical model at the base, it is increasingly difficult to say that an artificial intelligence does not have the faintest idea of what it is doing and does not understand the meaning of words when it is able to accurately translate them into images.
Yet, despite the impressive results, Dall-E 2 is also a victim of what has become known as algorithmic discrimination (due to which, over the years, there have even been unjustified arrests). Since these algorithms learn by analyzing huge amounts of data on the internet - therefore generated by humans - they have often been shown to suffer from the same prejudices present in our society.
In the case of Dall-E 2, what this entails was reported on Twitter by researcher Arthur Hollande Michel. For example, if you ask Dall-E 2 to create an image of a nurse or a personal assistant, here are the results.
Twitter @WriteArthur
Twitter @WriteArthur
What happens if you ask the artificial intelligence program founded by OpenAI to portray a lawyer or CEO is very different.
Twitter @WriteArthur
Twitter @WriteArthur
In a nutshell, basing his learning on data relating to professions that incorporate gender-based discrimination in them, here too Dall-E 2 makes them its own, as has already happened in the past in many cases of “algorithm prejudice”.
Dall-E 2 however presents other ethical risks. In fact, this tool can potentially be used to create highly accurate deepfakes, for example by telling the software to portray the president of the United States shooting at another person or perhaps by generating extremely accurate pornographic material starring some celebrities out of thin air. “We can forge the texts. We can put those texts in anyone's mouth. And we can then forge images and videos ", explained Oren Etzioni, head of the Allen Institute for Artificial Intelligence, for example:" Online disinformation already exists, but our fear is that it may reach new levels ".
È also for these reasons that, for the moment, OpenAI has not made this system available to the public, in addition to having set filters that prevent the system from generating certain types of images, having inserted a warning in all images indicating how they are been generated by artificial intelligence and more. However, some researchers have already independently managed to recreate similar (albeit simpler) systems: it is inevitably only a matter of time before, as has already happened just like with deepfakes, this type of algorithm is within anyone's reach.
Dall-E was in fact trained through hundreds of thousands of text-caption pairs fished on the internet. By dint of associating them, he has learned to independently derive images from texts alone. For example, asking the algorithm to create a cartoon image of a penguin wearing a blue hat, red gloves, a green t-shirt and yellow pants the result is what you see below.
Image courtesy of OpenAI
A little over a year later, OpenAI revealed the next version: Dall-E 2, an artificial intelligence capable of generating images always starting from the text, but in a much faster way, making them almost indistinguishable from real photographs and also providing the ability to edit them. For example, we can ask Dall-E 2 to create the image of an astronaut on horseback, or of a "fox sitting in a field at sunset in the style of Claude Monet".
Image courtesy of OpenAI
At this point, the image can be edited in an equally simple way: just tell the artificial intelligence to replace the astronaut with a basketball player or the fox with an elephant. But how does such a system work with such precision? “A neural network learns its skills by analyzing huge amounts of data. By identifying the patterns present in thousands of images, for example, of bananas, he learns to recognize them - explains the New York Times -. Dall-E instead searches for patterns through the analysis of millions of digital images and related captions, which describe what is portrayed in each image. In this way, he learns to recognize the connections between images and words ".
Unlike its previous incarnation (which, as mentioned, was based on Gpt-3), Dall-E 2 exploits a network neural called Clip, which on the basis of the text commands received begins to outline a few characteristic features of the requested image. In the case of an astronaut, for example, Clip could limit himself to graphically hinting at the typical space helmet. At this point, a second neural network (to be precise a diffusion model) intervenes, which starting from the salient features of an image generates the pixels that it expects to be necessary to complete it correctly.
Opportunities and risks The Dall-E 2's capabilities - according to Alex Nichol, one of the project managers - could be the basis of commercial products used for work by artists, designers and video game developers. Widening the gaze, it is possible to imagine that a system of this kind could represent another step in the direction of general artificial intelligence: an algorithm therefore endowed with capacity and flexibility at least equal to those of the human brain.
Differently of other artificial intelligence systems, specialized exclusively in recognizing images, in creating them or in dialoguing through the written form, Dall-E 2 (and, to a lesser extent, also its previous version) possesses the amazing ability to combine the textual aspect and that linked to images, and in some cases also to understand the relationship between the two. Although there is always a statistical model at the base, it is increasingly difficult to say that an artificial intelligence does not have the faintest idea of what it is doing and does not understand the meaning of words when it is able to accurately translate them into images.
Yet, despite the impressive results, Dall-E 2 is also a victim of what has become known as algorithmic discrimination (due to which, over the years, there have even been unjustified arrests). Since these algorithms learn by analyzing huge amounts of data on the internet - therefore generated by humans - they have often been shown to suffer from the same prejudices present in our society.
In the case of Dall-E 2, what this entails was reported on Twitter by researcher Arthur Hollande Michel. For example, if you ask Dall-E 2 to create an image of a nurse or a personal assistant, here are the results.
Twitter @WriteArthur
Twitter @WriteArthur
What happens if you ask the artificial intelligence program founded by OpenAI to portray a lawyer or CEO is very different.
Twitter @WriteArthur
Twitter @WriteArthur
In a nutshell, basing his learning on data relating to professions that incorporate gender-based discrimination in them, here too Dall-E 2 makes them its own, as has already happened in the past in many cases of “algorithm prejudice”.
Dall-E 2 however presents other ethical risks. In fact, this tool can potentially be used to create highly accurate deepfakes, for example by telling the software to portray the president of the United States shooting at another person or perhaps by generating extremely accurate pornographic material starring some celebrities out of thin air. “We can forge the texts. We can put those texts in anyone's mouth. And we can then forge images and videos ", explained Oren Etzioni, head of the Allen Institute for Artificial Intelligence, for example:" Online disinformation already exists, but our fear is that it may reach new levels ".
È also for these reasons that, for the moment, OpenAI has not made this system available to the public, in addition to having set filters that prevent the system from generating certain types of images, having inserted a warning in all images indicating how they are been generated by artificial intelligence and more. However, some researchers have already independently managed to recreate similar (albeit simpler) systems: it is inevitably only a matter of time before, as has already happened just like with deepfakes, this type of algorithm is within anyone's reach.