Compiler supports multi-threading CPU to accelerate design process

The latest compiler for heterogeneous compute systems announced by CacheQ Systems supports multi-threading acceleration for CPUs with multiple physical cores. It offers software developers the ability to “write once, accelerate anywhere,” says the company.

The compiler, part of the CacheQ Compiler Collection, eliminates manual code rewriting and the use of threading libraries or complex parallel execution APIs such as OpenMP or MPI. It takes single-threaded C code and generates executables that can run on CPUs, leveraging many physical x86 cores with or without hyper-threading, as well as Arm and RISC-V cores.

Users can generate code for multi-core processors on the same or different architectures and benchmark usage with runtime variables. They can add to the hardware for performance and power usage or reduce the number of cores and allocate other processes to achieve more optimal performance per watt of power consumed, says CacheQ.

The result is an acceleration of more than 486 per cent over single-thread execution on x86 processors with 12 logical cores, based on benchmarks from the Black Scholes financial algorithm that simulates human behaviour in stock trading. An Apple M1 processor with eight Arm cores is 400 per cent faster than the single-threaded GNU Compiler Collection (GCC), says the company.

“This is a game changer for software developers to take full advantage of parallel processing power without spending years learning to code with OpenMP or MPI,” says Clay Johnson, CEO of CacheQ Systems. A single thread algorithm can now be accelerated to compile and target any CPU with two or more cores, he adds.

CacheQ enables software developers to easily develop and deploy custom hardware accelerators for heterogeneous compute systems including FPGAS, CPUs and GPUs. The CacheQ Compiler Collection is modelled after the gcc tool suite, including a user interface similar to common open source compilers. It requires limited code modification to shorten development time and improve system quality improved, says CacheQ.

The tool suite enables compilation, linting and error detection, performance prediction, profiling, debug and visualisation of the generated virtual engine. It supports target hardware including single and multicore processors, as well as heterogeneous compute systems with FPGA accelerators connected to x86 and Arm processors.

The CacheQ Compiler Collection supports C code and C++ through hybrid access of an exported function call.

Execution of the M1 processor with two cores outperformed the x86 chip with 11 cores, demonstrating a cost-per-watt advantage. The Apple M1 processor with four cores performed 210 per cent faster than the x86 with 12 cores, reports CacheQ. Overall, it performed approximately 1,476 per cent faster than the single-threaded GCC running on x86 using the CacheQ Compiler Collection.

All simulations were performed on the same code compiled for different targets. Benchmarks were performed on an Intel i7-8700k x86 CPU running at 3.7GHz with six physical cores and hyperthreading for 12 logical cores available running Ubuntu 18.04.  Apple M1 benchmarks were captured with a Parallels VM running native Arm Ubuntu 20.04 image.

The CacheQ compiler tools are shipping now through a limited access program.

http://www.cacheq.com

Latest News from Softei

This news story is brought to you by softei.com, the specialist site dedicated to delivering information about what’s new in the electronics industry, with daily news updates, new products and industry news. To stay up-to-date, register to receive our weekly newsletters and keep yourself informed on the latest technology news and new products from around the globe. Simply click this link to register here: Softei Registration