NVIDIA smashes big data analytics benchmark by nearly 20x
NVIDIA has announced a significant outperformance of the record for a standard big data analytics benchmark known as TPCx-BB.
Utilising the RAPIDS suite of open-source data science software libraries, NVIDIA used 16 of its DGX A100 systems designed for AI workloads to power through the test in 14.5 minutes, compared to the previous record of 4.7 hours - representing a speed increase of 19.5 times.
The remarkable improvement was down to NVIDIA’s specialisation in graphical processing units (GPUs) rather than CPUs, having long developed the technology in both a consumer and enterprise setting. GPUs are capable of far more parallel operations compared to a CPU, which cannot do as much at the same time. This results in a bottleneck, where the CPU cannot process anywhere near as much data as once compared to a GPU.
The achievement has ramifications for the use of AI and machine learning in data analytics, with the TPCx-BB benchmark specifically set up to measure the performance of queries on both structured and unstructured data, mirroring real-world applications. Classic use cases such as inventory management, price analysis, sales analysis, recommendation systems, customer segmentation and sentiment analysis are all part of the test.
In a blog post, Nick Becker and Paul Mahler, two members of the RAPIDS open-source GPU data science project, said: “RAPIDS provides a new distributed computing paradigm for the TPCx-BB benchmark by running workloads on GPUs at both 1 and 10 TB scale. By working with the open source community, this sets a new bar for big data analytics performance.
“This work represents a true paradigm shift in business computing. RAPIDS was built on the idea that Graphics Processing Units could be used in the world of ETL and Machine Learning. This work represents an important step in demonstrating that this idea is not only valid but will soon be the standard.”