Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Discipline
      Discipline
      Clear All
      Discipline
  • Is Peer Reviewed
      Is Peer Reviewed
      Clear All
      Is Peer Reviewed
  • Item Type
      Item Type
      Clear All
      Item Type
  • Subject
      Subject
      Clear All
      Subject
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
      More Filters
      Clear All
      More Filters
      Source
    • Language
343 result(s) for "Memory access pattern"
Sort by:
An Area-Efficient and Configurable Number Theoretic Transform Accelerator for Homomorphic Encryption
Homomorphic Encryption (HE) allows for arbitrary computation of encrypted data, offering a method for preserving privacy in cloud computations. However, efficiency remains a significant obstacle, particularly with the polynomial multiplication of large parameter sets, which occupies substantial computing and memory overhead. Prior studies proposed the use of Number Theoretic Transform (NTT) to accelerate polynomial multiplication, which proved efficient, owing to its low computational complexity. However, these efforts primarily focused on NTT designs for small parameter sets, and configurability and memory efficiency were not considered carefully. This paper focuses on designing a unified NTT/Inverse NTT (INTT) architecture with high area efficiency and configurability, which is more suitable for HE schemes. We adopt the Constant-Geometry (CG) NTT algorithm and propose a conflict-free access pattern, demonstrating a 16.7% reduction in coefficients of storage capacity compared to the state-of-the-art CG NTT design. Additionally, we propose a twiddle factor generation strategy to minimize storage for Twiddle Factors (TFs). The proposed architecture offers configurability of both compile time and runtime, allowing for the deployment of varying parallelism and parameter sets during compilation while accommodating a wide range of polynomial degrees and moduli after compilation. Experimental results on the Xilinx FPGA show that our design can achieve higher area efficiency and configurability compared with previous works. Furthermore, we explore the performance difference between precomputed TFs and online-generated TFs for the NTT architecture, aiming to show the importance of online generation-based NTT architecture in HE applications.
An Instrumentation Approach for Hardware-Agnostic Software Characterization
Simulators and empirical profiling data are often used to understand how suitable a specific hardware architecture is for an application. However, simulators can be slow, and empirical profiling-based methods can only provide insights about the existing hardware on which the applications are executed. While the insights obtained in this way are valuable, such methods cannot be used to evaluate a large number of system designs efficiently. Analytical performance evaluation models are fast alternatives, particularly well-suited for system design-space exploration. However, to be truly application-specific, they need to be combined with a workload model that captures relevant application characteristics. In this paper we introduce PISA, a framework based on the LLVM infrastructure that is able to generate such a model for sequential and parallel applications by performing hardware-independent characterization. Characteristics such as instruction-level parallelism, memory access patterns and branch behavior are analyzed per thread or process during application execution. To illustrate the potential of the framework, we provide a detailed characterization of a representative benchmark for graph-based analytics, Graph 500. Finally, we analyze how the properties extracted with PISA across Graph 500 and SPEC CPU2006 applications compare to measurements performed on x86 and POWER8 processors.
Switchable cache: utilising dark silicon for application specific cache optimisations
Caches are used to improve memory access time and energy consumption. The cache configuration which enables the best performance often differs between applications due to diverse memory access patterns. The authors present a new concept, called switchable cache, where multiple cache configurations exist on chip, leveraging the abundant transistors available due to what is known as the dark silicon phenomenon. Only one cache configuration is active at any given time based on the application under execution, while all other configurations remain inactive (dark). They describe an architecture to enable seamless integration of multiple cache configurations, and a novel design space exploration methodology to rapidly pre‐determine the optimal set of configurations at design‐time, for a given group of applications. For design spaces containing trillions of design points, the authors’ exploration methodology always found the optimal solution in less than 2 s. The switchable cache improved memory access time by up to 26.2% when compared to a fixed cache.
Numerical algorithms for personalized search in self-organizing information networks
This book lays out the theoretical groundwork for personalized search and reputation management, both on the Web and in peer-to-peer and social networks. Representing much of the foundational research in this field, the book develops scalable algorithms that exploit the graphlike properties underlying personalized search and reputation management, and delves into realistic scenarios regarding Web-scale data. Sep Kamvar focuses on eigenvector-based techniques in Web search, introducing a personalized variant of Google's PageRank algorithm, and he outlines algorithms--such as the now-famous quadratic extrapolation technique--that speed up computation, making personalized PageRank feasible. Kamvar suggests that Power Method-related techniques ultimately should be the basis for improving the PageRank algorithm, and he presents algorithms that exploit the convergence behavior of individual components of the PageRank vector. Kamvar then extends the ideas of reputation management and personalized search to distributed networks like peer-to-peer and social networks. He highlights locality and computational considerations related to the structure of the network, and considers such unique issues as malicious peers. He describes the EigenTrust algorithm and applies various PageRank concepts to P2P settings. Discussion chapters summarizing results conclude the book's two main sections. Clear and thorough, this book provides an authoritative look at central innovations in search for all of those interested in the subject.
Chapter 23 - Vectorization Advice
This chapter introduces a tool that radically improves the ease at which you can analyze the nature of the vectorization in the hot loops of a program. The discoveries this tool helps make on how compiled code is vectorized, or what is stopping code from being vectorized, checking for loop dependencies, and observing the memory access patterns, are of value regardless of what system you are targeting. The examples provided in the chapter, clearly show what can be achieved using the Vector Advisor.
Cache miss analysis of WHT algorithms
On modern computers memory access patterns and cache utilization are as important, if not more important, than operation count in obtaining high-performance implementations of algorithms. In this work, the memory behavior of a large family of algorithms for computing the Walsh-Hadamard transform, an important signal processing transform related to the fast Fourier transform, is investigated. Empirical evidence shows that the family of algorithms exhibit a wide range of performance, despite the fact that all algorithms perform the same number of arithmetic operations. Different algorithms, while having the same number of memory operations, access memory in different patterns and consequently have different numbers of cache misses. A recurrence relation is derived for the number of cache misses and is used to determine the distribution of cache misses over the space of WHT algorithms.
Working memory limitations constrain visual episodic long-term memory at both specific and gist levels of representation
Limitations in one’s capacity to encode information in working memory (WM) constrain later access to that information in long-term memory (LTM). The present study examined whether these WM constraints on episodic LTM are limited to specific representations of past episodes or also extend to gist representations. Across three experiments, young adult participants ( n  = 40 per experiment) studied objects in set sizes of two or six items, either sequentially (Experiments 1a and 1b) or simultaneously (Experiment 2). They then completed old/new recognition tests immediately after each sequence (WM tests). After a long study phase, participants completed LTM conjoint recognition tests, featuring old but untested items from the WM phase, lures that were similar to studied items at gist but not specific levels of representation, and new items unrelated to studied items at both specific and gist levels of representation. Results showed that LTM estimates of specific and gist memory representations from a multinomial-processing-tree model were reduced for items encoded under supra-capacity set sizes (six items) relative to within-capacity set sizes (two items). These results suggest that WM encoding capacity limitations constrain episodic LTM at both specific and gist levels of representation, at least for visual objects. The ability to retrieve from LTM each type of representation for a visual item is contingent on the degree to which the item could be encoded in WM.
DenRAM: neuromorphic dendritic architecture with RRAM for efficient temporal processing with delays
Neuroscience findings emphasize the role of dendritic branching in neocortical pyramidal neurons for non-linear computations and signal processing. Dendritic branches facilitate temporal feature detection via synaptic delays that enable coincidence detection (CD) mechanisms. Spiking neural networks highlight the significance of delays for spatio-temporal pattern recognition in feed-forward networks, eliminating the need for recurrent structures. Here, we introduce DenRAM, a novel analog electronic feed-forward spiking neural network with dendritic compartments. Utilizing 130 nm technology integrated with resistive RAM (RRAM), DenRAM incorporates both delays and synaptic weights. By configuring RRAMs to emulate bio-realistic delays and exploiting their heterogeneity, DenRAM mimics synaptic delays and efficiently performs CD for pattern recognition. Hardware-aware simulations on temporal benchmarks show DenRAM’s robustness against hardware noise, and its higher accuracy over recurrent networks. DenRAM advances temporal processing in neuromorphic computing, optimizes memory usage, and marks progress in low-power, real-time signal processing The authors present DenRAM, a hardware realization of spiking neural network with dendritic architecture. It utilizes memristive devices to implement both delay and weight parameters, enhancing low-power signal processing with reduced memory use.
Decomposing the multiple encoding benefit in visual long-term memory: Primary contributions by the number of encoding opportunities
Although access to the seemingly infinite capacity of our visual long-term memory (VLTM) can be restricted by visual working memory (VWM) capacity at encoding and retrieval, access can be improved with repeated encoding. This leads to the multiple encoding benefit (MEB), the finding that VLTM performance improves as the number of opportunities to encode the same information increases over time. However, as the number of encoding opportunities increases, so do other factors such as the number of identical encoded VWM representations and chances to engage in successful retrieval during each opportunity. Thus, across two experiments, we disentangled the contributions of each of these factors to the MEB by having participants encode a varying number of identical objects across multiple encoding opportunities. Along with behavioural data, we also examined two established EEG correlates that track the number of maintained VWM representations, namely the posterior alpha suppression and the negative slow wave. Here, we identified that the primary mechanism behind the MEB was the number of encoding opportunities. That is, recognition memory performance was higher following an increase in the number of encoding opportunities, and this could not be attributed solely to an increase in the number of encoded VWM representations or successful retrieval. Our results thus contribute to the understanding of the fundamental mechanisms behind the influence of VWM on VLTM encoding.