Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Discipline
      Discipline
      Clear All
      Discipline
  • Is Peer Reviewed
      Is Peer Reviewed
      Clear All
      Is Peer Reviewed
  • Item Type
      Item Type
      Clear All
      Item Type
  • Subject
      Subject
      Clear All
      Subject
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
      More Filters
      Clear All
      More Filters
      Source
    • Language
4,080 result(s) for "Floating point arithmetic"
Sort by:
Modified fast discrete‐time PID formulas for obtaining double precision accuracy
Proportional integral derivative (PID) controllers are widely used across various industries. This paper presents a new modified PID controller based on integer origin raw data, which are equivalent to classic PID controller based on floating‐point actual values. These new formulas presented in new PID controller provide a mathematical approach to the ‘Classic PID Formula’, ‘Subtractor Formula’ and ‘Scaling Formula’, which form the basis of classic PID controller. The approach integrates these three formulas and separates them into integer and real value by applying the properties of associativity and commutativity. This method uses origin raw data as input to perform integer‐based computation and performs floating‐point operations once. This resulted in faster computation time and energy savings, while showing accuracy comparable to the existing double precision formulas. This paper presents a new modified proportional‐integral‐derivative (PID) controller based on integer origin raw data, which are equivalent to classic PID controller based on floating‐point actual values. This method uses origin raw data as input to perform integer‐based computation and performs floating‐point operations once. This resulted in faster computation time and energy savings, while showing accuracy comparable to the existing double precision formulas.
Implementing a chaotic cryptosystem in a 64-bit embedded system by using multiple-precision arithmetic
This paper proposes a new chaotic cryptosystem for the encryption of very high-resolution digital images based on the design of a digital chaos generator by using arbitrary precision arithmetic. This can be taken as an alternative to reduce the dynamic degradation that chaotic models present when they are implemented in digital devices and to increase the security of the cryptosystems. The obtained results show that when using high-precision arithmetic, the generated sequences provide good randomness and security during a greater number of iterations of the implemented chaotic maps in comparison with the generated sequences by using the standard of simple precision or double precision according to the IEEE 754 standard for floating-point arithmetic. The proposed method does not require high-cost hardware for increasing the numerical accuracy and security. As an advantage versus other recent works, using high precision, in relation to the methods that use simple precision or double precision, it awards an exponential increase in the key space. In this manner, it is demonstrated that using multiple-precision arithmetic, a key space of 2 33 , 268 or higher can be obtained, depending on the level of high precision configured. The security analysis confirms that the proposed chaotic cryptosystem is secure and robust against several known attacks, as well as statistical tests of NIST and TestU01, proving that high-precision arithmetic helps to enhance the security of the cryptosystems.
The Numerical Validation of the Adomian Decomposition Method for Solving Volterra Integral Equation with Discontinuous Kernels Using the CESTAC Method
The aim of this paper is to present a new method and the tool to validate the numerical results of the Volterra integral equation with discontinuous kernels in linear and non-linear forms obtained from the Adomian decomposition method. Because of disadvantages of the traditional absolute error to show the accuracy of the mathematical methods which is based on the floating point arithmetic, we apply the stochastic arithmetic and new condition to study the efficiency of the method which is based on two successive approximations. Thus the CESTAC method (Controle et Estimation Stochastique des Arrondis de Calculs) and the CADNA (Control of Accuracy and Debugging for Numerical Applications) library are employed. Finding the optimal iteration of the method, optimal approximation and the optimal error are some of advantages of the stochastic arithmetic, the CESTAC method and the CADNA library in comparison with the floating point arithmetic and usual packages. The theorems are proved to show the convergence analysis of the Adomian decomposition method for solving the mentioned problem. Also, the main theorem of the CESTAC method is presented which shows the equality between the number of common significant digits between exact and approximate solutions and two successive approximations.This makes in possible to apply the new termination criterion instead of absolute error. Several examples in both linear and nonlinear cases are solved and the numerical results for the stochastic arithmetic and the floating-point arithmetic are compared to demonstrate the accuracy of the novel method.
Numerical behavior of NVIDIA tensor cores
We explore the floating-point arithmetic implemented in the NVIDIA tensor cores, which are hardware accelerators for mixed-precision matrix multiplication available on the Volta, Turing, and Ampere microarchitectures. Using Volta V100, Turing T4, and Ampere A100 graphics cards, we determine what precision is used for the intermediate results, whether subnormal numbers are supported, what rounding mode is used, in which order the operations underlying the matrix multiplication are performed, and whether partial sums are normalized. These aspects are not documented by NVIDIA, and we gain insight by running carefully designed numerical experiments on these hardware units. Knowing the answers to these questions is important if one wishes to: (1) accurately simulate NVIDIA tensor cores on conventional hardware; (2) understand the differences between results produced by code that utilizes tensor cores and code that uses only IEEE 754-compliant arithmetic operations; and (3) build custom hardware whose behavior matches that of NVIDIA tensor cores. As part of this work we provide a test suite that can be easily adapted to test newer versions of the NVIDIA tensor cores as well as similar accelerators from other vendors, as they become available. Moreover, we identify a non-monotonicity issue affecting floating point multi-operand adders if the intermediate results are not normalized after each step.
Floating-Point Quantization Analysis of Multi-Layer Perceptron Artificial Neural Networks
The impact of quantization in Multi-Layer Perceptron (MLP) Artificial Neural Networks (ANNs) is presented in this paper. In this architecture, the constant increase in size and the demand to decrease bit precision are two factors that contribute to the significant enlargement of quantization errors. We introduce an analytical tool that models the propagation of Quantization Noise Power (QNP) in floating-point MLP ANNs. Contrary to the state-of-the-art approach, which compares the exact and quantized data experimentally, the proposed algorithm can predict the QNP theoretically when the effect of operation quantization and Coefficient Quantization Error (CQE) are considered. This supports decisions in determining the required precision during the hardware design. The algorithm is flexible in handling MLP ANNs of user-defined parameters, such as size and type of activation function. Additionally, a simulation environment is built that can perform each operation on an adjustable bit precision. The accuracy of the QNP calculation is verified with two publicly available benchmarked datasets, using the default precision simulation environment as a reference.
Enabling Floating-Point Arithmetic in the Coq Proof Assistant
Floating-point arithmetic is a well-known and extremely efficient way of performing approximate computations over the real numbers. Although it requires some careful considerations, floating-point numbers are nowadays routinely used to prove mathematical theorems. Numerical computations have been applied in the context of formal proofs too, as illustrated by the CoqInterval library. But these computations do not benefit from the powerful floating-point units available in modern processors, since they are emulated inside the logic of the formal system. This paper experiments with the use of hardware floating-point numbers for numerically intensive proofs verified by the Coq proof assistant. This gives rise to various questions regarding the formalization, the implementation, the usability, and the level of trust. This approach has been applied to the CoqInterval and ValidSDP libraries, which demonstrates a speedup of at least one order of magnitude.
Hierarchical search algorithm for error detection in floating-point arithmetic expressions
Scientific and engineering applications rely on floating-point arithmetic to approximate real numbers. Due to the inherent rounding errors in floating-point numbers, error propagation during calculations can accumulate and lead to serious errors that may compromise the safety and reliability of the program. In theory, the most accurate method of error detection is to exhaustively search all possible floating-point inputs, but this is not feasible in practice due to the huge search space involved. Effectively and efficiently detecting maximum floating-point errors has been a challenge. To address this challenge, we design and implement an error detection tool for floating-point arithmetic expressions called HSED. It leverages modified mantissas under double precision floating-point types to simulate hierarchical searches from either half or single precision to double precision. Experimental results show that for 32 single-parameter arithmetic expressions in the FPBench benchmark test set, the error detection effects and performance of HSED are significantly better than the state-of-the-art error detection tools Herbie, S3FP and ATOMU. HSED outperforms Herbie, Herbie+, S3FP and ATOMU in 24, 19, 27 and 25 cases, respectively. The average time taken by Herbie, Herbie+, and S3FP is 1.82, 11.20, and 129.15 times longer than HSED, respectively.
Extension of accurate numerical algorithms for matrix multiplication based on error-free transformation
The error-free transformation of matrix multiplication is a useful technique for accurate numerical computations in linear algebra problems. It can be used to transform the product of two floating-point matrices into an unevaluated sum of floating-point matrices, making it useful for developing accurate numerical algorithms for matrix multiplication. This technique splits both left and right matrices into k floating-point matrices, and then 12k(k+1) times matrix multiplications are performed. We extend this technique and propose several accurate algorithms for matrix multiplication, which involve p times matrix multiplications with p=4,5,8,9, respectively. The proposed algorithms efficiently provide more accurate results than those by double-precision arithmetic and less accurate than those by quadruple-precision arithmetic. In addition, we propose alternative forms to reduce the number of matrix multiplications with rounding errors. Numerical results show that the number of matrix multiplications affects the accuracy of the computed results. This dependence is examined using rounding error analysis and confirmed through numerical experiments.
Efficient n-th Root Computation on Microcontrollers Employing Magic Constants and Modified Newton and Householder Methods
With the growing number of applications in embedded systems—such as IoT modules, smart sensors, and wearable devices—there is an increasing demand for fast and accurate computations on resource-constrained platforms. In this paper, we present a new method for computing n-th roots in floating-point arithmetic based on an initial estimate generated by a “magic constant,” followed by one or two iterations of a modified Newton–Raphson or Householder algorithm. For cubic and quartic roots, we provide C implementations operating in single-precision floating-point format. The proposed algorithms are evaluated in terms of maximum relative error and execution time on selected microcontrollers. They exhibit high accuracy and noticeably reduced computation time. For example, our methods for computing cubic roots outperform the standard library function cbrtf() in both speed and precision. The results may be useful in a variety of fields, including biomedical and biophysical applications, statistical analysis, and real-time image and signal processing.
Formally-Verified Round-Off Error Analysis of Runge–Kutta Methods
Numerical errors are insidious, difficult to predict and inherent in different levels of critical systems design. Indeed, numerical algorithms generally constitute approximations of an ideal mathematical model, which itself constitutes an approximation of a physical reality which has undergone multiple measurement errors. To this are added rounding errors due to computer arithmetic implementations, often neglected even if they can significantly distort the results obtained. This applies to Runge–Kutta methods used for the numerical integration of ordinary differential equations, that are ubiquitous to model fundamental laws of physics, chemistry, biology or economy. We provide a Coq formalization of the rounding error analysis of Runge–Kutta methods applied to linear systems and implemented in floating-point arithmetic. We propose a generic methodology to build a bound on the error accumulated over the iterations, taking gradual underflow into account. We then instantiate this methodology for two classic Runge–Kutta methods, namely Euler and RK2. The formalization of the results include the definition of matrix norms, the proof of rounding error bounds of matrix operations and the formalization of the generic results and their applications on examples. In order to support the proposed approach, we provide numerical experiments on examples coming from nuclear physics applications.