Editor’s note: Marvell announced the next generation Thunderx3 and Thunder x4. Hopefully they make more of a splash than the x2 did.
Everybody loves an underdog story.
With the longstanding dominance of Intel’s Xeon chips in the server market, any potential contender is bound to draw real market interest.
Cavium’s new ARM processor, the Thunderx2, has certainly drawn interest.
As with 3D Xpoint memory, it’s been touted in various PR pieces, tech journals, and blogs as the next big thing.
As with many things, the hype is justified in some places and less so in others.
The Cavium ThunderX2s pricepoint is very competitive.
It has enough SKUs to provide a flexible lineup, and its platforms certainly look to rival the standards.
Even vendors like HPE and Cray are stepping up to the plate.
However, as with any underdog story there are still many hurdles for the Cavium ThunderX2 to overcome.
It may be some time before it can claim true market presence after its official release.
In this article we’ll go over what is known so far and the important metrics to note when evaluating the ThunderX2.
Cavium ThunderX2 Price
Cavium is certainly impressive from a price standpoint, with chips ranging from $800 to $1795.
Cavium’s core offering, the 9980, has comparable performance to the Intel Xeon Gold 6148 at a 70% lower pricetag.
As far as performance per dollar, Cavium matches up similarly well across the SKU lineup.
That being said, Intel is pricing their product as the undisputed market leader.
They could very easily lower pricing to rival the Thunder X2 lineup in performance per dollar if market share ever did slip.
Cavium Thunderx2 Power Consumption
While official power consumption benchmarks like TDP have not been forthcoming, initial Cavium ThunderX2 community tests have shown both idle power draw and TDP to exceed that of Intel processors.
In some cases this excess is quite significant.
One tester noted a 300W idle platform power draw, and another claiming over 800W at peak load.
While this has inspired criticism, it’s exceedingly common in pre-release stages for hardware to have efficiency or feature kinks to work out.
It remains to be seen what the official benchmarks will look like once the chips have gone mainstream.
That being said, it’s unlikely that firmware changes will produce more than marginal power efficiency improvements.
ThunderX2 Compute Performance
General Purpose Computing
For general purpose computing purposes, the ThunderX2 has limitations.
It performs well on multi-threaded tests, but falls behind behind even the E5-2699 V3 on the more single threaded benchmarks, like the UnixBench Whetstone.
Like the original Thunder before it, the ThunderX2 is more suited to webserver, database, or other low instruction level parallelism (ILP) purposes.
Unlike the 8 node AMD EPYC platform, the ThunderX2 uses two NUMA nodes like Intel, which is likely more attractive if you’re hoping to avoid fabric hops with NUMA un-aware apps.
However, aside from price, there’s nothing here for standard workloads that would have data centers decommissioning every server to hop on the ARM train.
High Performance Computing
As far as High Performance Computing (HPC) goes, ThunderX2 is impressive.
Cavium’s ThunderX2 performs very well in benchmarks that correlate with memory-bound bandwidth apps for performance, such as OpenFOAM, Stream, CloverLeaf, and TeaLeaf.
However, it falls short on compute bound code tests, such as the GROMACS and VASP.
Essentially any tests that rely on the cache prove less promising.
The ThunderX2 shows lower L1 & L2 cache bandwidth, as well as lower Floating point throughput (FLOPS).
These limitations may be resolved in the future with Scalable Vector Extension (SVE) enabled ARMs that contain wider vector units.
One often presumed obstacle is that HPC codes are already very optimized for x86, so as to leave the newer ARM implementations at lackluster efficiency in comparison.
However, surprisingly few of the codes are hyper optimized for HPC.
There is an advantage in some, such as GROMACS, but not to the extent some might assume.
Many apps don’t lend themselves to peak performance with efficient use of the wider vectors.
Overall, the ThunderX2 lineup proves promising for HPC and more specifically the very parallel designs of companies like Cray.
ThunderX2 Compatibility with Existing Apps
Most software vendors don’t support non-x86 architectures.
A significant number of cloud platforms are linux based, and thus could feasibly recompile for ARM.
However, many applications are optimized for specific processor lines.
It’s possible Thunderx2 can gain market share on the coattails of its appealing price point.
In that case we may see more support from the usual suspects to help further its adoption.
One area that will certainly provide an obstacle is deployment into virtualized environments.
You can’t shut down a Xeon based VM and boot back up on an ARM system like you can on a different Xeon system.
This is no small thing, as virtualization is now the norm with more than 75% of organizations using virtualized servers.
Does the ThunderX2 Have Effective Server Platform Options?
Several OEM and white box ODM vendors have jumped on board, with HPE, Cray, Inventec, and many others now providing X2 platforms.
The hardware is comparable, and doesn’t present any obstacles to hardware compatibility with storage systems or networking equipment.
Standard features such as a comparable web GUI and out-of-band-IPMI management make the ThunderX2 more viable with better inventory management options than previous offerings.
Is The ThunderX2 Vulnerable to Spectre?
Spectre is a problem inherent to out-of-order execution, so yes, thunderx2 should be just as vulnerable to spectre as other chips.
You’ll want to employ the same protective measures, but thankfully the newest Linux patches don’t significantly alter performance.
Closing Thoughts
Cavium’s Thunderx2 and ARM in general isn’t going to replace x86 any time soon.
Theoretically there’s no reason you couldn’t do something equivalent to x86 with ARM. There’s no reason you couldn’t just throw money at making larger ARM cores with equivalent preteching and SIMD.
But Intel has thrown billions of dollars at that concept, and history would claim that beating a corporate behemoth at their own game is generally not a winning proposition.
ThunderX2 lags behind with more general purpose workloads.
There’s certainly enough of those in the modern server environment to prevent its de facto adoption.
Early TDP & power consumption ratings don’t help either.
However, this architecture is very effective at handling specific types of workloads.
With its memory bandwidth, it can provide a unique solution for memory-bound app environments.
In the past we saw GPUs skyrocket into use as they fit their niche with iterative algorithsms.
Perhaps ARM can similarily fit its own role in HPC and other niche use cases.
With Moore’s law long dead, CPU manufacturers large and small will focus more on parallel processing rather than speeding up of individual cores. The new paradigm has been set. The progress will be interesting to behold.