Hardwarebeschleunigung für energieeffiziente KI auf Edge-Geräten

In future, smart sensor technology will be used to process the measured values locally with the help of AI or to adapt to changes in the environment. In the "Green ICT @ FMD" project, scientists at Fraunhofer IMS are working on the energy-efficient implementation of AI algorithms. The RISC-V processor AIRISC (www.airisc.de) has therefore been enhanced with instruction set extensions and coprocessors for the extremely efficient calculation of neural networks. For an application in the field of medical data evaluation, a speed advantage of more than a factor of 7 was achieved with only around 10 % overhead in terms of area and energy requirements.

A contribution from:
Alexander Stanitzki
Head business unit Industry
Fraunhofer-Institut for Microelectronic Circuits and Systems IMS

The basic version of the RISC-V processor AIRISC for embedded and sensor applications from Fraunhofer IMS has been available as a free core on GitHub since the beginning of 2022 and can be downloaded there . The free version is available under the permissive Solderpad license and comes with ready-made example projects for various FPGA development boards. This makes it easy to evaluate the core and use it in customer-specific and commercial applications. Fraunhofer IMS is also researching and developing application-specific extensions, for example for power electronics, medical applications or image processing.

Blockschaltbild des AIRISC-V Core — Figure 1: Block diagram of the AIRISC RISC-V Core

The latest result of these developments is a package of accelerators for the extremely efficient inference of neural networks. This enables the use of modern AI algorithms even on extremely energy-critical hardware, for example in energy-autonomous sensor systems. The extension package consists of the following components:

Collection of hardware accelerators for common activation functions (AF). As of March 2022, these include tanh, sigmoid, softsign and softmax based on the e-function.
The parallel execution of several multiplication and addition operations (multiply-accumulate for matrix multiplications). Currently for the data types 16-bit integer (2-fold parallel) and 8-bit integer (4-fold parallel). The extension replaces the standard ALU and therefore requires the minimum possible hardware overhead at maximum speed through direct integration into the processor pipeline.

Schematische Darstellung des verwendeten Testnetzes — Figure 2: Schematic representation of the used test net

A feed-forward neural network with 13 neurons in the input layer, a hidden layer with 17 neurons and two neurons in the output layer serves as a benchmark for the accelerators described. The network originates from a real application for examining ECG data for the presence of atrial fibrillation.

Schaubild benötigte Systemtakte für die Inferenz mit oder ohne Hardware-Beschleunigung — Figure 3: Needed system clocks for the inference with and without hardware accelerators

By using the hardware accelerators, a speed advantage of more than a factor of 7 can be achieved when inferring the neural network (Fig. 3). In addition to the parallel execution of the matrix multiplications, a large proportion of memory accesses is also saved here. Fig. 4 shows the overhead in terms of hardware, area and power loss that the described accelerators entail. In addition, the requirement for the hardware floating point unit (FPU), which is also available for the AIRISC, is shown.

Schaubild Overhead für die beschriebenen Beschleuniger sowie die ebenfalls verfügbare Fließkommaeinheit — Figure 4: Overhead for the hardware accelerators and the available floatingpint unit

The high advantage in terms of speed is offset by a moderate overhead of required hardware. The significant reduction in operating frequency that this makes possible means that a considerable proportion of the system's required power consumption can be saved. Details were presented at the 5th meeting of the Duisburg RISC-V group. The presentation is available on the YouTube channel of RISC-V International. The possible energy savings through the use of the developed chip, especially in comparison with standard platforms, will be examined in detail in the course of the Green ICT @FMD project.[2] International abrufbar. Die mögliche Energieersparnis durch den Einsatz des entwickelten Chips, insbesondere auch im Vergleich mit Standard-Plattformen, wird im Laufe des Projekts Green ICT @FMD im Detail untersucht.

[1] https://github.com/Fraunhofer-IMS/airisc_core_complex

[2] https://www.youtube.com/watch?v=chV6EaBcVKw&t=730s

RISC-V Core "AIRISC" from Fraunhofer IMS: Hardware acceleration for energy-efficient AI on edge devices

Leave a Reply Cancel Reply