Software accelerates FPGA development for edge computing
Software developers can realise up to 100x performance increase and reduce development time by up to a factor of 15, using CacheQ’s QCC Acceleration Platform, says the company.
The distributed heterogenous compute development environment provides a high-level language (HLL) software development platform for heterogeneous compute architectures. At its heart is the CacheQ Virtual Machine (CQVM). This is a complete application representation that can be analysed, partitioned, optimised and targeted to a variety of compute engines. Input is HLL code that is generated into CQVM, then optimised and partitioned. The final result are compute executables from the partitioned CQVM.
The CQVM allows extensive analysis and optimisation can be done before compute executable generation. Software developers can perform performance simulation, profile the compete virtual machine, view the CQVM to examine partitioning results and hot spots and analyse compute resource utilisation to accelerate design cycles.
To avoid partitioning challenges, requiring the development of compute engine-specific code, software developers write one application using the QCC Acceleration Platform. It automatically partitions an application across compute elements that can be combinations of processors and FPGAs. It supports both automatic and user-guided partitioning to deliver performance and reduce development time.
There is automatic pipelining to accelerate development. For example, CacheQ cites FPGAs’ fully pipelined execution time of a (N+ C)/(clock rate). For more acceleration, pipelined loops can be unrolled to deliver greater acceleration through a simple command line option with no code modification, adds CacheQ.
Traditional FPGA development requires users to rewrite their code and guarantee predictable memory access. The CacheQ offers tight integration with the memory subsystem and application code to deliver performance and reduce development time.
The Acceleration Platform’s proprietary multi-port arbitrated cached memory subsystem integrates with the CQVM to deliver up to 100 memory ports and Tbytes of memory bandwidth. In addition, malloc (C dynamic memory allocation) and complex pointer references are supported.