close
close

Moving beyond GPUs: The evolving landscape of AI chips and accelerators


Join our daily and weekly newsletters to receive the latest updates and exclusive content on our industry-leading AI coverage. Find out more


This article is part of a special issue of VB titled “Fit for Purpose: Tailoring AI Infrastructure.” Find all other stories here.

Data centers are the backbone of the Internet as we know it. Whether it’s Netflix or Google, all large companies use data centers and self-hosted computer systems to deliver digital services to end users. As enterprise attention shifts to advanced AI workloads, traditional CPU-centric data center servers are being improved by integrating new specialized chips, or “coprocessors.”

The idea behind these coprocessors is to introduce some kind of addition that increases the computing power of servers. This enables them to meet the computational demands of workloads such as AI training, inference, database acceleration, and networking functions. Over the past few years, GPUs, led by Nvidia, have been the top choice for coprocessors due to their ability to process large amounts of data at unmatched speeds. Due to increased demand, 74% of co-processors supporting AI use cases in data centers accounted for 74% of co-processors supporting AI use cases in data centers last year, according to a study by Futurum Group.

The study shows that the dominance of GPUs will only increase, with revenues from this category growing by 30% annually to $102 billion by 2028. But the point is that while GPUs with their parallel processing architecture are a strong companion for when accelerating all kinds of large-scale AI workloads (such as training and running massive, trillion-dollar language models or genome sequencing), their total cost of ownership can be very high. For example, Nvidia’s flagship GB200 “superchip,” which combines the Grace processor with two B200 GPUs, is expected to cost between $60,000 and $70,000. A server with 36 of these superchips is estimated to cost around $2 million.

While this may work in some cases, such as large-scale projects, it does not work for every company. Many enterprise IT managers are looking to adopt new technology to support selected low- and medium-intensity AI workloads, with a particular focus on total cost of ownership, scalability, and integration. After all, most AI models (deep learning networks, neural networks, large language models, etc.) are in a maturing phase, with needs shifting towards AI inference and performance improvements for specific workloads such as image recognition, systems recommendations or identification of objects – while at the same time effective.

>>Don’t miss our special issue: Fit for Purpose: Tailoring AI Infrastructure.<

This is where the emerging landscape of specialized AI processors and accelerators, built by chipmakers, startups and cloud service providers, comes into play.

What exactly are AI processors and accelerators?

In essence, AI processors and accelerators are chips that reside in the server processor ecosystem and focus on specific AI functions. They typically rely on three key architectures: application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), and the recent innovation of neural processing units (NPUs).

ASICs and FPGAs have been around for a long time, and the only difference between them is programmability. ASICs are custom-built from scratch for a specific task (which may or may not be AI-related), while FPGAs can be reconfigured at a later stage to implement custom logic. NPUs differ from both in that they serve as specialized hardware that can only accelerate AI/ML workloads such as inference and neural network training.

“Accelerators are typically able to perform any function individually, and sometimes in a board-scale or multi-chip ASIC design, they can support several different applications. “NPUs are a good example of a specialized chip (typically part of a system) that can handle many matrix math and neural network use cases, as well as various inference tasks while consuming less power,” Futurum Group CEO Daniel Newman tells Venturebeat.

The best part is that accelerators, especially ASICs and NPUs built for specific applications, can prove to be more efficient than GPUs in terms of cost and power consumption.

“GPU designs mainly focus on arithmetic logic units (ALUs), so they can perform thousands of computations simultaneously, while AI accelerator designs mainly focus on cores or Tensor Processor Units (TPCs). “Generally speaking, the comparison of AI accelerator performance versus GPU performance is based on the established function of that design,” Rohit Badlaney, general manager of IBM cloud and industry platforms, tells VentureBeat.

Today, IBM is taking a hybrid cloud approach and leverages multiple GPUs and AI accelerators in its stack, including products from Nvidia and Intel, to give enterprises the choice to meet the needs of their unique workloads and applications – while maintaining high performance and efficiency.

“Our full-stack solutions aim to help transform the way enterprises, developers and the open source community create and use generative AI. “AI accelerators are one of the offerings that we believe is very beneficial for customers looking to implement generative AI,” Badlaney said. He added that while GPU systems are best suited for training and tuning large models, there are many AI tasks that accelerators can handle just as well – and at a lower cost.

For example, IBM Cloud virtual servers use the Intel Gaudi 3 accelerator with a custom software stack designed specifically for inference and large memory requirements. The company also plans to use the accelerator for tuning and small training workloads via small clusters of multiple systems.

“AI accelerators and GPUs can be used effectively for some similar workloads such as LLM and diffusion models (image generation such as Stable Diffusion), as well as standard object recognition, classification and voice dubbing. However, the benefits and differences between AI accelerators and GPUs depend entirely on the hardware vendor’s design. For example, the Gaudi 3 AI accelerator is designed to deliver significant architecture-based increases in computing power, memory bandwidth, and energy efficiency,” Badlaney explained.

This, in his opinion, translates directly into benefits in terms of price-performance ratio.

In addition to Intel, other AI accelerators are also attracting attention on the market. This includes not only custom chips built for and by public cloud providers like Google, AWS and Microsoft, but also dedicated products (in some cases NPUs) from startups like Groq, Graphcore, SambaNova Systems and Cerebras Systems. They all stand out in their own way, challenging GPUs in different areas.

In one case, Tractable, a company that develops artificial intelligence for property and vehicle damage analysis for insurance claims, was able to leverage Graphcore’s Intelligent Processing Unit-POD system (a specialized NPU offering) to achieve significant performance gains over previously used processors graphic.

“We saw approximately a 5x increase in speed,” Razvan Ranca, co-founder and chief technology officer at Tractable, wrote in a blog post. “This means that a researcher can now perform potentially five times as many experiments, which means we speed up the entire R&D process and ultimately end up with better models in our products.”

In some cases, AI processors also support training workloads. For example, the AI ​​supercomputer at the Aleph Alpha data center uses Cerebras CS-3, a system powered by the third-generation Wafer Scale Engine with 900,000 AI cores, to create next-generation sovereign AI models. Even Google’s recently introduced custom ASIC, TPU v5p, powers some AI training workloads for companies like Salesforce and Lightricks.

What should be the approach to selecting accelerators?

Now that it has been established that there are many AI processors in addition to GPUs that accelerate AI workloads, especially inference, the question becomes: how does an IT manager choose the best option to invest in? Some of these chips may provide good performance and efficiency, but may be limited in the type of AI tasks they can handle due to their architecture. Others may do more, but the difference in total cost of ownership may not be that huge compared to GPUs.

Because the answer varies by chip design, all experts VentureBeat spoke with suggested that the choice should be based on the scale and type of workload being processed, data, likelihood of further iteration/change, and cost and availability needs.

According to Daniel Kearney, chief technology officer at Sustainable Metal Cloud, which helps companies with AI training and inference, it’s also important for enterprises to benchmark to check price-to-performance benefits and make sure their teams are familiar with the broader software ecosystem that supports relevant AI accelerators.

“While detailed workload information may not be available in advance or may be ambiguous to support decision-making, it is recommended to compare and test against representative workloads, real-world tests and available, peer-reviewed real-world information where available, to provide a data-driven approach to selecting the right AI accelerator for the right workload. This preliminary examination can save a lot of time and money, especially in the case of large and expensive training tasks,” he suggested.

Globally, as the number of inference-related jobs increases, the total market for AI hardware, including AI chips, accelerators and GPUs, is expected to grow 30% annually and reach $138 billion by 2028.