close
close

Solondais

Where news breaks first, every time

sinolod

Exclusive EU AI Act Auditor Reveals Big Tech’s Compliance Pitfalls

By Martin Coulter

LONDON (Reuters) – Some of the most prominent artificial intelligence models fail to meet European regulations in key areas such as cybersecurity resilience and discriminatory outcomes, according to data seen by Reuters.

The EU had long debated new AI regulations before OpenAI made ChatGPT public in late 2022. The record popularity and subsequent public debate over the supposed existential risks of such models prompted lawmakers to craft rules specific around “general purpose” AI. (GPAI).

Now a new tool, welcomed by European Union officials, tested generative AI models developed by major tech companies like Meta and OpenAI in dozens of categories, in accordance with the EU’s sweeping law on AI, which comes into force in 2017. steps over the next two years.

Designed by Swiss startup LatticeFlow AI and its partners at two research institutes, ETH Zurich and INSAIT in Bulgaria, the framework assigns AI models a score between 0 and 1 in dozens of categories, including technical robustness and security.

A ranking released Wednesday by LatticeFlow shows that models developed by Alibaba, Anthropic, OpenAI, Meta and Mistral all received average scores of 0.75 or higher.

However, the company’s Large Language Model (LLM) Checker revealed gaps in some models in key areas, highlighting areas where businesses may need to reallocate resources in order to ensure compliance.

Companies that fail to comply with the AI ​​law would face fines of 35 million euros ($38 million), or 7% of global annual turnover.

MIXED RESULTS

Currently, the EU is still trying to establish how AI law rules regarding generative AI tools such as ChatGPT will be applied, convening experts to develop a code of practice governing the technology by spring 2025.

But the test offers an early indicator of specific areas where tech companies may be breaking the law.

For example, discriminatory results are a persistent problem in the development of generative AI models, reflecting human biases regarding gender, race, and other areas when prompted.

In discriminatory output testing, LatticeFlow’s LLM Checker gave OpenAI’s “GPT-3.5 Turbo” a relatively low score of 0.46. For the same category, Alibaba Cloud’s “Qwen1.5 72B Chat” model only received a rating of 0.37.

In testing for “prompt hijacking,” a type of cyberattack in which hackers disguise a malicious prompt as legitimate to extract sensitive information, the LLM Checker gave Meta’s “Llama 2 13B Chat” template a score of 0 .42. In the same category, the “8x7B Instruct” model from the French startup Mistral received 0.38.

“Claude 3 Opus,” a model developed by Google-backed Anthropic, received the highest average rating, 0.89.

The test has been designed in accordance with the text of the AI ​​Act and will be expanded to encompass other enforcement measures as they are introduced. LatticeFlow said the LLM Checker will be available for free so developers can test their models for compliance online.

Petar Tsankov, the company’s CEO and co-founder, told Reuters the test results were generally positive and offered a roadmap for companies to refine their models in accordance with the AI ​​law.

“The EU is still developing all the compliance criteria, but we can already see some gaps in the models,” he said. “By placing greater emphasis on optimizing compliance, we believe model providers can be well prepared to meet regulatory requirements.”

Meta and Mistral declined to comment. Alibaba, Anthropic and OpenAI did not immediately respond to requests for comment.

Although the European Commission cannot verify external tools, the body was briefed throughout the development of the LLM Checker and described it as a “first step” in implementing the new laws.

A European Commission spokesperson said: “The Commission welcomes this study and platform for evaluating AI models, which constitutes a first step towards translating European AI law into technical requirements. »

($1 = 0.9173 euros)

(Reporting by Martin Coulter; editing by Hugh Lawson and Bernadette Baum)