In Spain, an algorithm assesses the risk of violence against women. But it failed
In Spain
In Spain, an algorithmic system has existed for fifteen years that assesses the risk of relapse in the context of domestic violence. It is called VioGén, Integral system for monitoring cases of gender-based violence, and has been used by the Ministry of the Interior since 2007 on the provision of Ley Organica 1 of 2004, which contains measures to combat gender-based violence. The software uses complaint data to make risk predictions by analyzing the responses of people who report violence to the police and assigning a corresponding level of protection based on the severity encountered. However, these forecasts can create serious problems in some cases. A recent evaluation of the model by Eticas Consulting, an algorithmic auditing company, showed that when the risk is classified as too low, women do not receive help.According to what the ministry website reports, the main objective of VioGén is to integrate all information regarding complaints to obtain risk forecasts and protect victims throughout the country. Last September, in an interview, one of the creators of the program said that "VioGén has helped to alert the police of cases where there is a risk that a man could act again". According to the Spanish ministry, after the first abuse the violence returns and continues in 15% of cases. Starting from this data, the algorithm was designed and implemented, with the idea that, in the presence of some specific indicators, a software may be able to predict that percentage of risk and alert the police.
At the time of reporting, women are asked to fill in an online form containing 39 questions to be answered with "present" or "not present" (for example if there has been sexual assault, if offender works if he showed jealousy or had used drugs). Through classical statistical models, the answers are then translated into mathematical language and, as in the most common predictive systems, used to assign a risk score ("not detected", "low", "medium", "high", "extreme" ) to which different types of aid and police protection measures correspond: from simple checks to the provision of an automated notification subsystem capable of issuing alarms. According to data from the Ministry of the Interior from 2020, more than 500,000 cases of female whistleblowers have been registered since VioGén went into operation in 2007.
The evaluation of the system In 2018, Eticas Consulting independently conducted and published an impact analysis of VioGén, collecting a series of hypotheses related to its social impact and potential biases, given its very delicate application context. Eticas then offered to develop a pro bono internal audit, which took into account the concrete impact of the algorithm and its operation, paying particular attention to its degree of efficiency for cultural, socio-economic and territorial groups. Having never received feedback from the ministry, after four years of waiting the company decided to carry out an external audit: not being able to access the source code, the analysis was conducted with reverse engineering methods. The results are public, together with the methodology used.
The analysis carried out by Eticas lasted seven months and allowed to reveal some potential problems of the algorithm: it revealed, for example, that in 2021 only one woman out of seven received help after being reported. Gemma Galdon-Clavell, CEO of Eticas Consulting, explained to sportsgaming.win that "the most obvious bias is the tendency to underestimate the risk of women when they have no children". This emerged from the external audit, and according to Galdon-Clavell "surely if we had had access to internal data we could have better explained what works and what doesn't work, helping to mitigate the risks: in the end, the objective of the audit is not to criticize, but to generate a space of accountability regarding these technologies ".
Since the software was developed, there have been numerous cases of gender-based violence classified as" low risk "which have resulted in femicides or murder of children. According to the worrying data reported by El Mundo, already in 2014 fourteen out of fifteen of the women then murdered had reported their attacker, but were considered as low-risk or "non-existent".
This also happens because, very often, the police force itself is not sufficiently prepared to understand the causes of violence against women. It therefore happens that reports of stalking or death threats end up being downplayed rather than interpreted as the base of the pyramid of gender-based violence. Similarly, the questions in the questionnaire can be too direct or even re-victimizing for women, paralyzing them at the time of reporting.
Many victims reported being left completely alone while filling out the form, or being “completely lost in the questionnaires between nervousness and crying. There was no one with me to explain the questionnaires ”and in some cases also underwent pressure from the agents to finish quickly. The scenario that emerges is very different from what the 2004 law required which led to the birth of VioGén, which required the intervention of interdisciplinary teams (psychologists, social workers and coroners) in cases of complaint. The team would be responsible for investigating the psychological aspects that the VioGén module is unable to cover, in addition to a forensic assessment of the attacker. This indicates that, despite the considerable progress made over the years by Spanish government policies, gender-based violence continues to be a structural problem that has not yet been addressed satisfactorily. In the absence of education and preparation on the subject, the use of algorithms by the police can therefore end up perpetuating gender stereotypes instead of eradicating them.
Risk forecasting and automation Reporting to the police, in itself, seems to involve the assignment of the minimum level of risk. The problem is that there is no public data on software decisions: which responses are weighted more than the others, how binary responses "present" or "not present" are translated into risk levels based on the various indicators. There is no information available even with respect to the algorithm updates in the last fifteen years, although the software has probably changed over time. Finally, it is not clear how much risk forecasts can influence court decisions.
As the Eticas audit also notes, the levels of transparency, human supervision and accountability associated with VioGén are very low. In 95% of cases, police officers followed the risk score offered by the system. This is not surprising at all: many similar cases show us that the automation of decision-making processes leads to perceive the machine as a substitute, and not as a support, for human decisions. This tends to lead to a lack of responsibility and an unconditional trust in the system in those who employ them. Lack of human control can lead to unexpected short circuits even in a system like VioGén, born with the best of intentions.
There seem to be no doubts about the technical reliability of the software. The calculation of AUC (Area under the curve) - one of the most popular formulas for evaluating the performance of predictive models - a few years ago gave "absolutely satisfactory" results. Given their enormous social impact, however, in the case of automated decisions or recommendations we should perhaps ask ourselves what is the broader meaning to attribute to their functioning. Are we just talking about technical performance? Or even the ability to adapt to the context? Above all, we should take into account the human ability to integrate software into a larger system of practices, where these systems should serve as supporting components. The performance calculations do not take into account this, and the fact that VioGén was not intended as a standalone system but is used as such. How, then, to evaluate the social functioning of such a complex model?
A possible answer should certainly include listening to and involving the people and communities directly impacted. In this case, contact those who report and respond to the questionnaire directly, to find out how the functioning of the system is judged, perceived and clarified, but also to those who use it, such as the police. What should be an essential step, the inclusion of users in the design phases, is not practiced in most of these situations. At the moment, then, there is no obligation in Europe to evaluate these software, to carry out internal audits like the one that Eticas had proposed to carry out on VioGén - with access to the code, datasets, weights and errors - despite their impact. potential on so many people when employed by public administrations.
The European regulation on artificial intelligence should introduce - probably starting from 2024 in all member states - some of these requirements. One of the critical points of the current text is that a system such as VioGén would fall outside the scope of the law because it cannot be included in any of the uses currently classified as "high risk", and therefore would not oblige to maintain updated documentation and carry out checks. periodicals.
This example makes it very clear how the approach to automated processes should take concrete impacts much more into consideration: the technical premises are not enough, considering that some bias dynamics are not evident until are applied to the company. At that point they can lead, together with human deresponsibility, to serious problems for individuals, society and specific groups of people, without the consequences having been foreseen and when it is too late to solve them. Predictive systems, given their enormous power to influence decisions based on historical data, should be subject to greater public scrutiny.
The hope, also for Gemma Galdon-Clavell, is that "the public administration understands that systems with a high social impact cannot be created without creating methodologies that guarantee the proper functioning of the system and incorporate in the process of end users too ".