Accelerator card by Untether AI delivers up to two Peta OPS for AI inference
AI workloads for inference require increasing amounts of compute resources, far outstripping the gains available to traditional CPU and GPU architectures, says Untether AI. It has unveiled the tsunAImi accelerator cards which is powered by the runAI devices. Untether AI says its use of at-memory computation, breaks through the barriers of traditional von Neumann architectures, to offer industry-leading compute density with power and price efficiency.
AI accelerators are used in data centres and Untether AI says that it focuses on inference acceleration, transferring the weights and activations between external memory, on-chip caches to the computing element to maximise power efficiency. Untether AI is able to deliver two Peta operations per second (OPS) in a standard PCI-Express card form factor.
“For AI inference in cloud and datacenters, compute density is king, said Arun Iyengar, CEO of Untether AI. “Untether AI is ushering in the Peta OPS era to accelerate AI inference workloads at scale with unprecedented efficiency,” he added.
The tsunAImu accelerator card is based on the company’s runAI200 devices. They are tailored for inference acceleration and operate using integer data types and a batch mode of 1. The at-memory compute architecture features a memory bank of 385kbytes of SRAM with a 2D array of 512 processing elements. There are 511 banks per chip and each device offers 200Mbyte of memory and operate at up to 502 Tera OPS in “sport” mode. It may also be configured for maximum efficiency, offering eight TOPs per watt in “eco” mode.
The runAI200 devices are manufactured by Untethered AI, using a cost-effective, mainstream 16nm process.
The tsunAImi accelerator cards are powered by four runAI200 devices, providing more than two times any currently available PCIe cards, says Untethered AI. This compute power translates into over 80,000 frames per second of ResNet-50 v 1.5 throughput at batch=1. This is three times the throughput of its nearest competitor. For natural language processing, tsunAImi accelerator cards can process more than 12,000 queries per second (qps) of BERT-base, four times faster than any announced product.
The Untether AI imAIgineTM software development kit (SDK) provides an automated path to running networks with push-button quantisation, optimisation, physical allocation and multi-chip partitioning. The imAIgine SDK frees data scientists from having to perform low-level optimisation tasks and instead spend time in developing models. The imAIgine SDK also provides an extensive visualisation toolkit, cycle-accurate simulator, and an easily integrated runtime API.
The imAIgine SDK is currently in early access with select customers and partners. The tsunAImi accelerator card is sampling now and will be commercially available in Q1 2021.