80 Million Tiny Images is offline, the apologies from MIT
In 2006 MIT made available to everyone a dataset called 80 Million Tiny Images made up of 79.3 million images obtained from the Google archive, divided into approximately 75,000 categories and labeled with tags or descriptions related to their content. A resource offered to those engaged in the development of algorithms capable of automatically interpreting what is present for example in a photograph.
After collecting the reports, the Massachusetts Institute of Technology decided to take the archive offline, apologizing. Here are some translated passages of the message signed by three of the project managers in translated form.
It has been brought to our attention that the Tiny Images dataset contains some derogatory terms as well as offensive categories and images. This is the consequence of automatic data collection based on the words of the WordNet database. We are deeply sorry for this and apologize to those affected.
80 Million Tiny Images was created 14 years ago, starting from a list of 53.464 words taken from WordNet for download then the corresponding images from the search engine, using the filters available at the time. The resolution of the files is very small (hence the name), is equal to 32×32 pixels).
do Not commit the error of thinking that distortions of this type in the phase of education of the algorithms could not lead to concrete consequences. We have written several times on these pages dealing with news on systems for face recognition, and recently even an AI created in order to predict the propensity to commit a crime of a subject solely on the basis of his analysis of physiognomy.
Source: MIT
MIT, dataset, AI and racism: 80 Million Tiny Images
It has so far been used by several teams to work on artificial intelligence and machine learning projects as well as as a benchmark to evaluate the effectiveness of computer vision technologies. Some recent reports, however, have highlighted how well educated systems are responsible for biases that show the side of racism and discrimination: dark-skinned people are often associated with offensive terms and the same goes for images depicting women.After collecting the reports, the Massachusetts Institute of Technology decided to take the archive offline, apologizing. Here are some translated passages of the message signed by three of the project managers in translated form.
It has been brought to our attention that the Tiny Images dataset contains some derogatory terms as well as offensive categories and images. This is the consequence of automatic data collection based on the words of the WordNet database. We are deeply sorry for this and apologize to those affected.
80 Million Tiny Images was created 14 years ago, starting from a list of 53.464 words taken from WordNet for download then the corresponding images from the search engine, using the filters available at the time. The resolution of the files is very small (hence the name), is equal to 32×32 pixels).
do Not commit the error of thinking that distortions of this type in the phase of education of the algorithms could not lead to concrete consequences. We have written several times on these pages dealing with news on systems for face recognition, and recently even an AI created in order to predict the propensity to commit a crime of a subject solely on the basis of his analysis of physiognomy.
Source: MIT