With the release this month of the first commercial server based on its Power9 processor, IBM is reaching another milestone in its quest to be the AI-workload leader for data centers and web service providers.
The Power9 chips in the systems hitting the market now don’t rev up to the top speeds provided by Intel’s Xeon Scalable Processor line, but they offer blazing throughput aimed to give them an edge in machine learning and accelerated database applications.
IBM unveiled its first Power9 server, the Power System AC922, Tuesday at the AI Summit in New York. It runs a version of the Power9 chip tuned for Linux, with the four-way multithreading variant SMT4. Power9 chips with SMT4 can offer up to 24 cores, though the chips in the AC922 top out at 22 cores. The fastest Power9 in the AC922 runs at 3.3GHz.
The air-cooled AC922 model 8335-GTG set for release mid-December, as well as two other models (one air-cooled and one water-cooled) scheduled to ship in the second quarter next year, offer two Power9 chips each and run Red Hat and Ubuntu Linux.
In 2018 IBM plans to release servers with a version of the Power9 tuned for AIX and System i, with SMT8 eight-way multithreading and PowerVM virtualization, topping out at 12 cores but likely running at faster clock speeds.
IBM has been working on the new-generation Power processor for four years, and started revealing specifications last year. The history of Power processors is intertwined with IBM’s AI efforts, and complements the company’s Watson artificial intelligence cloud service. The Watson system that beat humans in “Jeopardy” in 2011 ran on Power7 processors.
The Power9 family distinguishes itself by being the first processor line to support a range of new I/O technologies, including PCI-Express 4.0 and NVLink 2.0, as well as OpenCAPI, an interface architecture for high bandwidth AI and database accelerators such as ASICs and FPGAs.
These technologies allow the processors to work with a variety of coprocessors for workloads related to machine learning, high-performance computing, visual computing, and hyperscale web serving.
“We are thrilled to be able to introduce the Power9 to market in the AC922 — we have staked our claim to leadership in the AI workload space and this solidifies our position there,” said Stefanie Chiras, vice president of IBM Power Systems.
“When it comes down to to AI workloads it really is all about the data: How do you get data in, compute it and move it out and get that [machine-learning] model trained as fast and as accurate as possible with the most data,” Chiras said.
PCIe 4 provides bandwidth of up to 16 gigatransfers per second, twice that of PCIe 3, which Intel uses. NVLink 2.0 enables bandwidth of up to 25Gbps for Nvidia GPUs, the coprocessors of choice for artificial intelligence and so-called accelerated workloads such as those handled by the Kinetica distributed, in-memory database management system for advanced analytics.
Kinetica says tests show that its database performs 1.8 times faster on Power9 than it did on Power8. IBM says the Power9-based AC922 was also designed to drive demonstrable performance improvements across AI framworks such as Chainer, TensorFlow and Caffe.
“The most impressive and distinctive thing [about Power9] is the I/O; essentially it’s like the Swiss army knife of machine learning acceleration,” said Patrick Moorhead, principal at Moor Insights & Strategy.
Power9 processors are also being used in non-IBM systems, a powerful endorsement, Moorhead noted. For example, Google and Rackspace are designing a system called Zaius that uses Power9 and OpenCAPI, slated to be commercially available next year.
“Our focus is not only to do silicon but to allow system level value and allow others to innovate around it,” IBM’s Chiras said.
The U.S. Department of Energy’s Summit and Sierra supercomputers, at Oak Ridge National Laboratories and Lawrence Livermore National Laboratory, respectively, are also based on Power9.
IBM’s AC922 features two air-cooled models that each offer two Power9 processors with 16 to 20 cores, running from 2.25GHz to 3.12GHz, complemented by two to four Nvidia Volta V100 GPUs. The water-cooled version due out in the second quarter of 2018 will offer Power9 chips with 18 to 22 cores running at 2.55GHz to 3.3GHz, and two or four Nvidia V100s.
Power9 processors in the AC922 top out at a slower clock speed than the “Platinum” tier chip in Intel’s Xeon Processor Scalable Family, which runs at up to 3.6GHz. But even though the Xeon has up to 28 cores with 56 threads, the AC922’s 22-core Power9 with SMT4 has 88 threads. The higher thread density, which enables efficient use of processor resources and improves throughput, coupled with the Power9’s I/O capabilities, promises to give a boost to machine-learning workloads.
All AC922 models will offer 512KB of L2 Private cache per core; 10MB of L3 shared cache per pair of cores; 8 DIMMs (dual in-line memory modules) per processor; and 8GB to 128GB 2666Mhz DDR4 DIMMs. They also feature two USB 3.0 and two 16GB Ethernet ports; total disk storage of 7.68TB, and four PCIe Gen4 slots. Systems dimensions are 441.5 mm wide by 86 mm high by 822 mm deep. Prices will be released as the systems ship.
PowerAI software tools for machine-learning applications will be available for Power9 systems in early 2018. PowerAI enables, for example, Distributed Deep Learning — splitting deep-learning training jobs across multiple physical servers.
“I do believe IBM is in the lead when it comes to those tools and IBM is in the lead when it comes to the machine learning as well,” Moorhead said.
Intel up to now has had a virtual monopoly in server chips, with well over 90 percent of the market. But with Power9, IBM hopes to capture 20 percent of the market by 2020.
It’s been a busy year in the server-chip market, with Intel rolling out its Xeon Scalable line, the biggest revamp in its big-system processors in 10 years, in July. But it has challengers other than IBM. AMD in June unveiled its Epyc chip line, offering competitive per-watt performance and one-socket server deployment for maximum efficiency.
Applications that run on Xeons don’t have to be rewritten for Epyc since it’s based on X86 architecture. They do have to be rewritten for Power9 systems, but the scale of cloud services and the demand for applications such as machine learning may mean that it might be very economical to adapt software to hardware that runs related workloads very efficiently. Comparative real-world benchmarks are needed before that determination can be effectively made, though.