Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Discipline
      Discipline
      Clear All
      Discipline
  • Is Peer Reviewed
      Is Peer Reviewed
      Clear All
      Is Peer Reviewed
  • Item Type
      Item Type
      Clear All
      Item Type
  • Subject
      Subject
      Clear All
      Subject
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
1 result(s) for "double‐precision inputs"
Sort by:
Efficient digital implementation of a multi-precision square-root algorithm
In high performance computing systems and signal processing, there is a basic set of mathematical functions that are essential. While addition, subtraction and multiplication are well understood, there is less literature on square-rooting, which is a particularly time- and resource-consuming function. Traditional non-restoring algorithms produce a mantissa half the length of the input mantissa, causing a loss of precision. This study presents a method for increasing the accuracy of this algorithm. It is shown to work for all IEEE-754R standard floating-point numbers. Error analysis shows a 57-fold (for half-precision) and 134e6-fold improvement (for double-precision) in the normalised error, equivalent to at most 1 Units of Least Precision. Resource and performance optimised variants are analysed and their throughput analysed. On an Intel Stratix V device, performance optimised implementations achieve a throughput of 717 MFLOPs. Resource optimised implementations on a low-cost device require only 127 Adaptive Logic Modules and 232 registers, with a throughput of 8.56 MFLOPs. All implementations are DSP block and memory free, saving valuable resources. The maximum throughput of the presented design is 15.5 times greater than that proposed by Pimentel et al. and two orders of magnitude greater than typical multiply-accumulate methods.