Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Discipline
      Discipline
      Clear All
      Discipline
  • Is Peer Reviewed
      Is Peer Reviewed
      Clear All
      Is Peer Reviewed
  • Reading Level
      Reading Level
      Clear All
      Reading Level
  • Content Type
      Content Type
      Clear All
      Content Type
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
      More Filters
      Clear All
      More Filters
      Item Type
    • Is Full-Text Available
    • Subject
    • Publisher
    • Source
    • Donor
    • Language
    • Place of Publication
    • Contributors
    • Location
4,120 result(s) for "Distributed memory"
Sort by:
Scaling out NUMA-Aware Applications with RDMA-Based Distributed Shared Memory
The multicore evolution has stimulated renewed interests in scaling up applications on shared-memory multiprocessors, significantly improving the scalability of many applications. But the scalability is limited within a single node; therefore programmers still have to redesign applications to scale out over multiple nodes. This paper revisits the design and implementation of distributed shared memory (DSM) as a way to scale out applications optimized for non-uniform memory access (NUMA) architecture over a well-connected cluster. This paper presents MAGI, an efficient DSM system that provides a transparent shared address space with scalable performance on a cluster with fast network interfaces. MAGI is unique in that it presents a NUMA abstraction to fully harness the multicore resources in each node through hierarchical synchronization and memory management. MAGI also exploits the memory access patterns of big-data applications and leverages a set of optimizations for remote direct memory access (RDMA) to reduce the number of page faults and the cost of the coherence protocol. MAGI has been implemented as a user-space library with pthread-compatible interfaces and can run existing multithreaded applications with minimized modifications. We deployed MAGI over an 8-node RDMAenabled cluster. Experimental evaluation shows that MAGI achieves up to 9.25x speedup compared with an unoptimized implementation, leading to a scalable performance for large-scale data-intensive applications.
Automating functional unit and register binding for synchoros CGRA platform
Coarse-grain reconfigurable architectures, which provide high computing throughput, low cost, scalability, and energy efficiency, have grown in popularity in recent years. SiLago is a new VLSI design framework comprised of two coarse-grain reconfigurable fabrics: a dynamically reconfigurable resource array and a distributed memory architecture. It employs the Vesyla compiler to map streaming applications on these fabrics. Binding is a critical step in the high-level synthesis that maps operations and variables to functional units and storage elements in the design. It influences design performance metrics such as power, latency, area, etc. The current version of Vesyla does not support automatic binding, and it has to be specified manually through pragmas, which makes it less flexible. This paper proposes various approaches to automate the binding in Vesyla. We present a list scheduling-based approach to automate functional unit binding and an integer linear programming approach to automate register binding. Furthermore, we determine the binding of various basic linear algebraic subprogram and image processing tasks using the proposed approaches. Finally, a comparative analysis has been made between the automatic and manual binding concerning the power dissipation and latency for various benchmarks. The experimental results show that the proposed automatic binding consumes significantly less power for nearly the same latency as manual binding.
Efficient Breadth-First Search on Massively Parallel and Distributed-Memory Machines
There are many large-scale graphs in real world such as Web graphs and social graphs. The interest in large-scale graph analysis is growing in recent years. Breadth-First Search (BFS) is one of the most fundamental graph algorithms used as a component of many graph algorithms. Our new method for distributed parallel BFS can compute BFS for one trillion vertices graph within half a second, using large supercomputers such as the K-Computer. By the use of our proposed algorithm, the K-Computer was ranked 1st in Graph500 using all the 82,944 nodes available on June and November 2015 and June 2016 38,621.4 GTEPS. Based on the hybrid BFS algorithm by Beamer (Proceedings of the 2013 IEEE 27th International Symposium on Parallel and Distributed Processing Workshops and PhD Forum, IPDPSW ’13, IEEE Computer Society, Washington, 2013 ), we devise sets of optimizations for scaling to extreme number of nodes, including a new efficient graph data structure and several optimization techniques such as vertex reordering and load balancing. Our performance evaluation on K-Computer shows that our new BFS is 3.19 times faster on 30,720 nodes than the base version using the previously known best techniques.
Resource abstraction and data placement for distributed hybrid memory pool
Emerging byte-addressable non-volatile memory (NVM) technologies offer higher density and lower cost than DRAM, at the expense of lower performance and limited write endurance. There have been many studies on hybrid NVM/DRAMmemory management in a single physical server. However, it is still an open problem on how to manage hybrid memories efficiently in a distributed environment. This paper proposes Alloy, a memory resource abstraction and data placement strategy for an RDMA-enabled distributed hybrid memory pool (DHMP). Alloy provides simple APIs for applications to utilize DRAM or NVM resource in the DHMP, without being aware of the hardware details of the DHMP. We propose a hotness-aware data placement scheme, which combines hot data migration, data replication and write merging together to improve application performance and reduce the cost of DRAM. We evaluate Alloy with several micro-benchmark workloads and public benchmark workloads. Experimental results show that Alloy can significantly reduce the DRAM usage in the DHMP by up to 95%, while reducing the total memory access time by up to 57% compared with the state-of-the-art approaches.
Impacts of Topology and Bandwidth on Distributed Shared Memory Systems
As high-performance computing designs become increasingly complex, the importance of evaluating with simulation also grows. One of the most critical aspects of distributed computing design is the network architecture; different topologies and bandwidths have dramatic impacts on the overall performance of the system and should be explored to find the optimal design point. This work uses simulations developed to run in the existing Structural Simulation Toolkit v12.1.0 software framework to show that for a hypothetical test case, more complicated network topologies have better overall performance and performance improves with increased bandwidth, making them worth the additional design effort and expense. Specifically, the test case HyperX topology is shown to outperform the next best evaluated topology by thirty percent and is the only topology that did not experience diminishing performance gains with increased bandwidth.
Integration and optimization of multiple big data processing platforms
Purpose – The purpose of this paper is to integrate and optimize a multiple big data processing platform with the features of high performance, high availability and high scalability in big data environment. Design/methodology/approach – First, the integration of Apache Hive, Cloudera Impala and BDAS Shark make the platform support SQL-like query. Next, users can access a single interface and select the best performance of big data warehouse platform automatically by the proposed optimizer. Finally, the distributed memory storage system Memcached incorporated into the distributed file system, Apache HDFS, is employed for fast caching query results. Therefore, if users query the same SQL command, the same result responds rapidly from the cache system instead of suffering the repeated searches in a big data warehouse and taking a longer time to retrieve. Findings – As a result the proposed approach significantly improves the overall performance and dramatically reduces the search time as querying a database, especially applying for the high-repeatable SQL commands under multi-user mode. Research limitations/implications – Currently, Shark’s latest stable version 0.9.1 does not support the latest versions of Spark and Hive. In addition, this series of software only supports Oracle JDK7. Using Oracle JDK8 or Open JDK will cause serious errors, and some software will be unable to run. Practical implications – The problem with this system is that some blocks are missing when too many blocks are stored in one result (about 100,000 records). Another problem is that the sequential writing into In-memory cache wastes time. Originality/value – When the remaining memory capacity is 2 GB or less on each server, Impala and Shark will have a lot of page swapping, causing extremely low performance. When the data scale is larger, it may cause the JVM I/O exception and make the program crash. However, when the remaining memory capacity is sufficient, Shark is faster than Hive and Impala. Impala’s consumption of memory resources is between those of Shark and Hive. This amount of remaining memory is sufficient for Impala’s maximum performance. In this study, each server allocates 20 GB of memory for cluster computing and sets the amount of remaining memory as Level 1: 3 percent (0.6 GB), Level 2: 15 percent (3 GB) and Level 3: 75 percent (15 GB) as the critical points. The program automatically selects Hive when memory is less than 15 percent, Impala at 15 to 75 percent and Shark at more than 75 percent.
Improving prediction with enhanced Distributed Memory-based Resilient Dataset Filter
Launching new products in the consumer electronics market is challenging. Developing and marketing the same in limited time affect the sustainability of such companies. This research work introduces a model that can predict the success of a product. A Feature Information Gain (FIG) measure is used for significant feature identification and Distributed Memory-based Resilient Dataset Filter (DMRDF) is used to eliminate duplicate reviews, which in turn improves the reliability of the product reviews. The pre-processed dataset is used for prediction of product pre-launch in the market using classifiers such as Logistic regression and Support vector machine. DMRDF method is fault-tolerant because of its resilience property and also reduces the dataset redundancy; hence, it increases the prediction accuracy of the model. The proposed model works in a distributed environment to handle a massive volume of the dataset and therefore, it is scalable. The output of this feature modelling and prediction allows the manufacturer to optimize the design of his new product.
Brain-wide mapping reveals that engrams for a single memory are distributed across multiple brain regions
Neuronal ensembles that hold specific memory (memory engrams) have been identified in the hippocampus, amygdala, or cortex. However, it has been hypothesized that engrams of a specific memory are distributed among multiple brain regions that are functionally connected, referred to as a unified engram complex. Here, we report a partial map of the engram complex for contextual fear conditioning memory by characterizing encoding activated neuronal ensembles in 247 regions using tissue phenotyping in mice. The mapping was aided by an engram index, which identified 117 cFos + brain regions holding engrams with high probability, and brain-wide reactivation of these neuronal ensembles by recall. Optogenetic manipulation experiments revealed engram ensembles, many of which were functionally connected to hippocampal or amygdala engrams. Simultaneous chemogenetic reactivation of multiple engram ensembles conferred a greater level of memory recall than reactivation of a single engram ensemble, reflecting the natural memory recall process. Overall, our study supports the unified engram complex hypothesis for memory storage. Where memories are located in our brains is not well understood. In this paper, the authors demonstrate that memories are spread out throughout multiple brain regions.
Spatial movement with distributed memory
Diffusion has been widely applied to model animal movement that follows Brownian motion. However, animals typically move in non-Brownian ways due to their perceptual judgment. Spatial memory and cognition recently have received much attention in characterizing complicated animal movement behaviours. Explicit spatial memory is modeled via a distributed delayed diffusion term in this paper. The distributed time represents the memory growth and decay over time, and the spatial nonlocality reflects the dependence of spatial memory on location. When the temporal delay kernel is weak under the assumption that animals can immediately acquire knowledge and memory decays over time, the equation is equivalent to a Keller–Segel chemotaxis model. For the strong kernel with learning and memory decay stages, rich spatiotemporal dynamics, such as Turing and checker-board patterns, appear via spatially non-homogeneous steady-state and Hopf bifurcations.