Our paper titled 'High Performance and Energy Efficient AMD and BWAD Pooling Schemes for CNN Accelerators' got published after being accepted at the 26th EuroMicro Symposium on DSD, 2023, in Durres, Albania. Huge thanks to Prof. Madhav Rao for the constant support and guidance, and to my co-author, Vinay Rayapati, for his significant contribution.
Abstract: Convolution Neural Network (CNN) accelerator designs have a plethora of applications but they account for high computational complexity and demand huge resources. The challenge is to attain reduction in various hardware parameters in size-constrained edge inferencing systems. Multiple methods such as using approximate multipliers, or systolic arrays or quantization techniques, to name a few, were experimented on, previously, to achieve the same. But, one of the important layers in a CNN is a pooling layer where the feature map size is reduced based on a predefined scheme. These pooling schemes play a prominent role in determining the accuracy obtained and also affect the hardware usage in accelerator designs. This paper discusses the hardware design of two new pooling methods, Binary Weighted Absolute Deviation (BWAD) and Absolute Maximum Deviation (AMD) which showcase promising results in terms of area, delay, area-delay-product (ADP), and power-delay-product (PDP), and also maintain comparable accuracy with that of the existing state-of-the-art (SOTA) methods, Max and Absolute Average Deviation (AAD) pooling. The proposed pooling schemes make use of the best features in the existing methods such as calculating absolute deviation between consecutive pixels for maintaining fairly comparable accuracy and consist of simple designs, without the use of multipliers or dividers, when mapped on to hardware. The two pooling methods are validated by incorporating the corresponding layers in multiple CNN architectures and datasets for fair evaluation. The hardware efficiency of the pooling methods is investigated by synthesizing on Zynq 7000 series Zedboard (FPGA). ASIC flow synthesis results are obtained from Cadence Genus tool for 45 nm and 130 nm technology nodes. Hence, both the proposed pooling schemes are potential candidates for designing on-chip neural network hardware accelerators which exhibit a fine balance between network accuracy and hardware benefits.
You can read the paper on IEEE Xplore here: https://lnkd.in/gwkzBWC7.
#iiitb #ieee #dsd #euromicro #vlsi #cnn #accelerator #pooling
PhD Researcher @ Purdue | Probabilistic Computing, Physics and/or Bio Plausible Learning
5moCongratulations Suraj :)