close
close

Solondais

Where news breaks first, every time

sinolod

EU AI law auditor reveals big tech compliance pitfalls

Some of the most prominent artificial intelligence models fail to meet EU regulations in key areas such as cybersecurity resilience and discriminatory outcomes, according to data seen by Reuters.

The EU had long debated new AI regulations before OpenAI made ChatGPT public in late 2022. The record popularity and subsequent public debate over the supposed existential risks of such models prompted lawmakers to craft rules specific around “general purpose” AI. .

Now, a new tool designed by Swiss startup LatticeFlow and its partners, and backed by European Union officials, has tested generative AI models developed by major tech companies like Meta and OpenAI across dozens of categories, in line with the comprehensive EU AI law, which will come into force in stages over the next two years.

Assigning each model a score between 0 and 1, a ranking released Wednesday by LatticeFlow shows that models developed by Alibaba, Anthropic, OpenAI, Meta and Mistral all received average scores of 0.75 or higher.

However, the company’s Large Language Model (LLM) Checker revealed gaps in some models in key areas, highlighting areas where businesses may need to reallocate resources in order to ensure compliance.

Companies that fail to comply with the AI ​​law would face fines of $38 million, or 7% of global annual turnover.

Mixed results

Currently, the EU is still trying to establish how AI law rules regarding generative AI tools such as ChatGPT will be applied, convening experts to develop a code of practice governing the technology by spring 2025.

But LatticeFlow’s test, developed in collaboration with researchers at Swiss university ETH Zurich and Bulgarian research institute INSAIT, offers an early indicator of specific areas where tech companies are at risk of breaking the law.

For example, discriminatory results are a persistent problem in the development of generative AI models, reflecting human biases regarding gender, race, and other areas when prompted.

In discriminatory output testing, LatticeFlow’s LLM Checker gave OpenAI’s “GPT-3.5 Turbo” a relatively low score of 0.46. For the same category, Alibaba Cloud’s 9988.HK “Qwen1.5 72B Chat” model only received a rating of 0.37.

In testing for “prompt hijacking,” a type of cyberattack in which hackers disguise a malicious prompt as legitimate to extract sensitive information, the LLM Checker gave Meta’s “Llama 2 13B Chat” template a score of 0 .42. In the same category, the “8x7B Instruct” model from the French startup Mistral received 0.38.

“Claude 3 Opus,” a model developed by Google-backed Anthropic, received the highest average rating, 0.89.

The test has been designed in accordance with the text of the AI ​​Act and will be expanded to encompass other enforcement measures as they are introduced. LatticeFlow said the LLM Checker will be available for free so developers can test their models for compliance online.

Petar Tsankov, the company’s CEO and co-founder, told Reuters the test results were generally positive and offered a roadmap for companies to refine their models in accordance with the AI ​​law.

“The EU is still developing all the compliance criteria, but we can already see some gaps in the models,” he said. “By placing greater emphasis on optimizing compliance, we believe model providers can be well prepared to meet regulatory requirements.”

Meta declined to comment. Alibaba, Anthropic, Mistral and OpenAI did not immediately respond to requests for comment.

Although the European Commission cannot verify external tools, the body was briefed throughout the development of the LLM Checker and described it as a “first step” in implementing the new laws.

A European Commission spokesperson said: “The Commission welcomes this study and platform for evaluating AI models, which constitutes a first step towards translating European AI law into technical requirements. »