Hardware-accelerated frameworks and libraries with FPGAs

15 October, 2020 |

Introduction
In the previous articles, we talked about Hardware Acceleration with FPGAs, the Key concepts about acceleration with FPGA that they provide, and the Hardware acceleration applications with FPGAs. In this latest installment of the series, we will focus on Hardware Accelerated Libraries and Frameworks with FPGAs, which implies zero changes to the code of an application. We will review the different alternatives available, for Machine and Deep Learning applications, Image, and Video Processing, as well as Databases.

Options for development with FPGAs
Historically, working with FPGAs has always been associated with the need for a Hardware developer, mainly Electronic Engineers, and the use of tools and Hardware Description Languages (HDL), such as VHDL and Verilog (of the concurrent type in instead of sequential), very different from those used in the field of Software development. In recent years, a new type of application has appeared, acceleration in data centers, which aims to reduce the gap between the Hardware and Software domains, for the cases of computationally demanding algorithms, with the processing of large volumes of data.

Applying levels of abstraction, replacing the typical HDL with a subset of C / C ++ combined with OpenCL, took the development to a more familiar environment for a Software developer. Thus, basic blocks (primitives) are provided, for Mathematical, Statistical, Linear Algebra, and Digital Signal Processing (DSP) applications. However, this alternative still requires a deep knowledge of the hardware involved, to achieve significant accelerations and higher performance.

Secondly, there are accelerated libraries of specific domains, for solutions in Finance, Databases, Image, and Video Processing, Data Compression, Security, etc. They are of the plug-and-play type and can be invoked directly with an API from our applications, written in C / C ++ or Python, requiring the replacement of “common” libraries with accelerated versions.

Finally, we will describe the main ones in this article, there are open source libraries and frameworks, which were accelerated by third parties. This allows us, generally running one or more Docker instances (on-premise or in the cloud), to accelerate Machine Learning applications, Image processing, and Databases, among others, without the need to change the code of our application.

Machine learning
Without a doubt, one of the most disruptive technological advances in recent years has been Machine Learning. Hardware acceleration brings many benefits, due to the high level of parallelism and the enormous number of matrix operations required. They are seen both in the training phase of the model (reducing times from days to hours or minutes) and in the inference phase, enabling real-time applications.

Here is a small list of the accelerated options available:


TensorFlow is a platform for building and training neural networks, using graphs. Created by Google, it is one of the leading Deep Learning frameworks.


Keras is a high-level API for neural networks written in Python. It works alone or as an interface to frameworks such as TensorFlow (with whom it is usually used) or Theano. It was developed to facilitate a quick experimentation process, it provides a very smooth learning curve.


PyTorch is a Python library designed to perform numerical calculations via tension programming. Mainly focused on the development of neural networks.


Deep Learning Framework noted for its scalability, modularity and high-speed data processing.


Scikit-learn is a library for math, science, and engineering. Includes modules for statistics, optimization, integrals, linear algebra, signal and image processing, and much more. Rely on Numpy, for fast handling of N-dimensional matrices.


XGBoost (Extreme Gradient Boosting), is one of the most used ML libraries, very efficient, flexible and portable.


Spark MLlib is Apache Spark’s ML library, with scaled and parallelized algorithms, taking advantage of the power of Spark. It includes the most common ML algorithms: Classification, Regression, Clustering, Collaborative Filters, Dimension Reduction, Decision Trees, and Recommendation. It can batch and stream. It also allows you to build, evaluate, and tune ML Pipelines.

Image and Video Processing
Image and Video Processing is another of the areas most benefited from hardware acceleration, making it possible to work in real-time on tasks such as video transcoding, live streaming, and image processing. Combined with Deep Learning, it is widely used in applications such as medical diagnostics, facial recognition, autonomous vehicles, smart stores, etc.


The most important library for Computer Vision and Image and Video Processing is OpenCV, open source, with more than 2500 functions available. There is an accelerated version of its main methods, adding more version after version.


For Video Processing, in tasks such as Transcoding, Encoding, Decoding and filtering, FFmpeg is one of the most used tools. There are accelerated plugins, for example for decoding and encoding H.264 and other formats. In addition, it supports the development of its own accelerated plugins.

Databases and analytics
Databases and Analytics receive increasingly complex workloads, mainly due to advances in Machine Learning, which forces an evolution of the Data Center. Hardware acceleration provides solutions to computing (for example with database engines that work at least 3 times faster) and storage (via SSD disks that incorporate FPGAs between their circuits, with direct access to data processing). the data). Some of the Accelerated Databases, or in the process of being so, mainly Open Source both SQL and NoSQL, are PostgreSQL, Mysql, Cassandra, and MongoDB.

In these cases, generally what is accelerated are the more complex low-level algorithms, such as data compression, compaction, aspects related to networking, storage, and integration with the storage medium. The accelerations reported are in the order of 3 to 10 times faster, which compared to improvements of up to 1500 times in ML algorithms may seem little, but they are very important for the reduction of costs associated with the data center.

Conclusions
Throughout this series of 4 articles, we learned what a device-level FPGA is, how acceleration is achieved when we are in the presence of a possible case that takes advantage of them (computationally complex algorithms, with large volumes of data). General cases of your application and particular solutions, ready to use without code changes.

How can Huenei help your business with Hardware Acceleration with FPGAs?


Infrastructure: Definition, acquisition and start-up (Cloud & On-promise).


Consulting: Consulting and deployment of available frameworks, to obtain acceleration without changes in the code.


Development: Adaptation of existing software through the use of accelerated libraries, to increase its performance.