Abstract
In high performance computing systems and signal processing, there is a basic set of mathematical functions that are essential. While addition, subtraction and multiplication are well understood, there is less literature on square-rooting, which is a particularly time- and resource-consuming function. Traditional non-restoring algorithms produce a mantissa half the length of
the input mantissa, causing a loss of precision. This study presents a method for increasing the accuracy of this algorithm. It is shown to work for all IEEE-754R standard floating-point numbers. Error analysis shows a 57-fold (for half-precision) and 134e6-
fold improvement (for double-precision) in the normalised error, equivalent to at most 1 Units of Least Precision. Resource and performance optimised variants are analysed and their throughput analysed. On an Intel Stratix V device, performance optimised implementations achieve a throughput of 717 MFLOPs. Resource optimised implementations on a low-cost device require only 127 Adaptive Logic Modules and 232 registers, with a throughput of 8.56 MFLOPs. All implementations are DSP
block and memory free, saving valuable resources. The maximum throughput of the presented design is 15.5 times greater than that proposed by Pimentel et al. and two orders of magnitude greater than typical multiply-accumulate methods.
the input mantissa, causing a loss of precision. This study presents a method for increasing the accuracy of this algorithm. It is shown to work for all IEEE-754R standard floating-point numbers. Error analysis shows a 57-fold (for half-precision) and 134e6-
fold improvement (for double-precision) in the normalised error, equivalent to at most 1 Units of Least Precision. Resource and performance optimised variants are analysed and their throughput analysed. On an Intel Stratix V device, performance optimised implementations achieve a throughput of 717 MFLOPs. Resource optimised implementations on a low-cost device require only 127 Adaptive Logic Modules and 232 registers, with a throughput of 8.56 MFLOPs. All implementations are DSP
block and memory free, saving valuable resources. The maximum throughput of the presented design is 15.5 times greater than that proposed by Pimentel et al. and two orders of magnitude greater than typical multiply-accumulate methods.
Original language | English |
---|---|
Pages (from-to) | 110-117 |
Number of pages | 8 |
Journal | IET Computers and Digital Techniques |
Volume | 13 |
Issue number | 2 |
Early online date | 21 Nov 2018 |
DOIs | |
Publication status | Published - 14 Mar 2019 |
ASJC Scopus subject areas
- Software
- Hardware and Architecture
- Electrical and Electronic Engineering
Fingerprint
Dive into the research topics of 'Efficient digital implementation of a multi-precision square-root algorithm'. Together they form a unique fingerprint.Profiles
-
Robert Watson
- Department of Electronic & Electrical Engineering - Senior Lecturer
- Water Innovation and Research Centre (WIRC)
- Electronics Materials, Circuits & Systems Research Unit (EMaCS)
- Centre for Digital, Manufacturing & Design (dMaDe)
- IAAPS: Propulsion and Mobility
Person: Research & Teaching, Core staff, Affiliate staff