close
close

Solondais

Where news breaks first, every time

AI21 CEO Says Transformers Not Suitable for AI Agents Due to Perpetuation of Errors
sinolod

AI21 CEO Says Transformers Not Suitable for AI Agents Due to Perpetuation of Errors


Join our daily and weekly newsletters for the latest updates and exclusive content covering cutting-edge AI. Learn more


As more companies look toward the so-called agentic future, one of the obstacles may lie in how AI models are built. For enterprise AI developer A121, the answer is clear: the industry must look to alternative model architectures to enable more effective AI agents.

Ari Goshen, CEO of AI21, said in an interview with VentureBeat that Transformers, the most popular model architecture, has limitations that would make a multi-agent ecosystem difficult.

“One trend I’m seeing is the rise of architectures that are not Transformers, and these alternative architectures will be more efficient,” Goshen said. “Transformers work by creating so many tokens that can cost a lot of money.”

AI21, which focuses on developing enterprise AI solutions, has previously argued that Transformers should be an option for model architecture but not the default. It develops foundation models using its JAMBA architecture, short for Joint Attention and Mamba architecture. It is based on the Mamba architecture developed by researchers at Princeton University and Carnegie Mellon University, which can offer faster inference times and longer context.

Goshen said that alternative architectures, like Mamba and Jamba, can often make agentic structures more efficient and, importantly, affordable. For him, Mamba-based models have better memory performance, which would allow agents, especially agents that connect to other models, to perform better.

He attributes the reason AI agents are only gaining popularity – and why most agents have not yet moved to the product – to the reliance on LLMs built with transformations.

“The main reason agents are not yet in production mode is reliability or lack of reliability,” Goshen said. “When you break down a transformer model, you know it’s very stochastic, so any errors will perpetuate.”

Corporate agents are growing in popularity

AI agents have emerged as one of the biggest trends in enterprise AI this year. Several companies have launched agents and AI platforms to make agent creation easier.

ServiceNow announced updates to its Now Assist AI platform, including a library of AI agents for customers. Salesforce has its line of agents called Agentforce while Slack has started allowing users to integrate agents from Salesforce, Cohere, Workday, Asana, Adobe and more.

Goshen believes that this trend will become even more popular with the right combination of models and model architectures.

“Some use cases we see now, like chatbot Q&A, are basically glorified search,” he said. “I think true intelligence is about connecting and retrieving different information from sources.”

Goshen added that AI21 is developing offerings around AI agents.

Other architectures vying for attention

Goshen strongly supports alternative architectures like AI21’s Mamba and Jamba, mainly because he thinks the transformer models are too expensive and difficult to use.

Instead of an attention mechanism that forms the backbone of transformer models, Mamba can prioritize different data and assign weights to inputs, optimize memory usage, and utilize the processing power of a GPU.

Mamba is gaining popularity. Other open-source and open-weight AI developers have started releasing models based on Mamba in recent months. Mistral released Codestral Mamba 7B in July and in August Falcon released its own Mamba-based model, Falcon Mamba 7B.

However, transformer architecture has become the default, if not standard, choice when developing foundation models. OpenAI’s GPT is, of course, a transformer model (it’s literally in the name), but so are most other popular models.

Goshen said that ultimately, companies want to choose the most reliable approach. But organizations should also be wary of flashy demos promising to solve many of their problems.

“We are in the phase where charismatic demonstrations are easy to achieve, but we are closer to that than the product phase,” Goshen said. “It’s OK to use enterprise AI for research, but we’re not yet at the point where companies can use it to inform decisions.”