Abstract
Optimized hardware for the execution of large dot-product (DP) calculations is central to many of today's integrated circuits. These arithmetic blocks are often implemented with the parallel fused DP (FDP) approach, and to achieve high performance, are realized with a tree-based compression algorithm, using on commercially available synthesis macros. However, these macros are based on performance optimization of the gate-level netlist, and fail to take into account the consequences of the applied heuristics on the physical-implementation (layout) of these large circuits. In this article, we propose a physical-aware approach to FDP implementation based on the affinity between the logic gates that make up the gate-level structure. The proposed clustered DP (CDP) algorithm, enables the place and route tools to cluster gates with high-affinity, leading to higher placement utilization and lower routing congestion. DP calculations with up to 78 multipliers were implemented with a 65-nm CMOS standard cell library, providing power reduction of up to 63%, up to 60% lower area, and performance improvements as high as 2.5×, as compared to similar implementations based on commercial macros based on post-layout results.
Original language | English |
---|---|
Article number | 8772144 |
Pages (from-to) | 2886-2897 |
Number of pages | 12 |
Journal | IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems |
Volume | 39 |
Issue number | 10 |
DOIs | |
State | Published - 1 Oct 2020 |
Externally published | Yes |
Keywords
- Clustered dot-product
- digital signal processing (DSP)
- high-speed
- low-power design
- multiplication algorithm
- multiplier
- physically aware multiplier
- physically aware synthesis
- place and route
- sum-of-products
- Wallace tree
ASJC Scopus subject areas
- Software
- Computer Graphics and Computer-Aided Design
- Electrical and Electronic Engineering