With the improvement in multi-core processor technology, the demand of application performance also increased. Delivering high performance to an application requires more processing speed from multicore-processors which increases the power consumption.
As the power has become the main metric for modern high performance computing, the researcher and system architect are proposing heterogeneous multi-core processing system that combine a multi-core processors with hardware accelerators or co-processor.
These accelerators improve the performance of compute intensive application by executing a certain task. Over the past few decades, FPGA-based accelerators gives the considerable improvement in performance and power efficiency
make them attractive to high performance computing world. The flexibility of achieving higher performance per watt prove that FPGA is capable to compete both super scalar and GPU accelerators, especially for high performance computing applications.
In recent years the embedded processor’s ARM based-servers has gained popularity in academia and industry due to low-cost and low power consumption compared to conventional processors. The computational capability of ARM embedded processors is not like other x86 architectures processors in server environments but according to a recent research, their market share will be expected to rise 25% in 2020. ARM based-Server is favorable for applications that need high throughput instead of computing power.
Therefore, due to hardware level reconfigurable and capacity of high performance computing the FPGAs gets focus to basically replace ASIC accelerators for high performance computing. The UCERD Pvt Ltd Islamabad (Supercomputing Team) has proposed and built up an FPGA-Powered Supercomputer. The Supercomputer utilizes five Zynq SoCs figure hubs and an Intel ace hub.
The structured framework utilizes message passing interface libraries for the correspondence between register hubs while AXI4-stream interfaces between the ARM processor and FPGA inside a figure hub. The system is able to take advantage of parallelism by executing high performance computing applications. The implementation of FIR filter application on system shows that the computational capability of ARM processor is increased by integrating FPGA accelerator to execute the compute-intensive portion of the application.
The performance of four compute-nodes of ARM processors with FPGA accelerator is 8.56 times which is higher than the performance of multiple nodes without accelerators. The supercomputer architecture of Zynq SoCs shows that with the advancement of processor technologies will decrease the gap between embedded processor and conventional processor in future high performance supercomputing.
In the past a number of heterogeneous architecture platforms are proposed. These platforms provide a foundation to FPGA to integrate with other conventional processors for the acceleration of high performance applications. The following research work promise an opportunity for FPGA-based accelerators with others computing units. Cray XD1 supercomputer, The Berkeley Emulation Engine 2 BEE2 and Maxwell project used FPGA as the only computing elements in their supercomputing cluster for application acceleration.
In 2008, the Convey Computer Corporation designed heterogonous computing platform that combines one or more x86 processors with FPGA-based application accelerator. The Convey Hybridcore HC-1 was the first product consist of Intel Xeon host processor and Xilinx Vertex FPGAs as a coprocessor. Tsoi. K. H presented a heterogeneous computer cluster known as Axle, consist of AMD Phenom Quad-core CPU, Nvidia GPU and a Xilinx Virtex-5 FPGA attached on a PCI bus as an accelerator for the simulation process of N-body algorithm.
George. A. et al. implemented a machine called Novo-G supercomputer made from 24 compute nodes with quad-core Xeon processor mounted on two PCI x8 PROCstar-III accelerator boards. Each board comprises four Startix-III E260 FPGAs from Altera.
Moreover, the following research works proposing FPGA-based accelerators with embedded processors. Lin Z. demonstrated an FPGA-based Hadoop cluster made of 8 computing nodes of Xilinx Zynq SoC called ZCluster. The aim of ZCluster to build a Hadoop cluster to increase the computing capabilities of ARM processors with the used of reconfigurable hardware accelerators.
Moorthy P. built up a cluster of 32-nodes of Xilinx Zynq SoC chips. The main objective of the 32-nodes cluster to assess the energy efficiency of hybrid SoC for fast mapping of parallel graph algorithms like neural network simulation. Bai X. et al. designed a cluster of 48-compute nodes and each computes node composed of Xilinx Zynq SoC chips.
The hybrid architectures provide a platform for ARM CPU merge with FPGA reconfigurable hardware. A non-subtraction Montgomery and Chines Reminder Theorem algorithms are implemented to test the performance of hybrid architectures platforms.
The above describe research works shows that researcher and system architects have made a great contribution to used FPGA-based accelerators with conventional processors as well with embedded processors for processing of high performance computing applications.