Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Reading LevelReading Level
-
Content TypeContent Type
-
YearFrom:-To:
-
More FiltersMore FiltersItem TypeIs Full-Text AvailableSubjectPublisherSourceDonorLanguagePlace of PublicationContributorsLocation
Done
Filters
Reset
234
result(s) for
"Distributed shared memory"
Sort by:
Exploiting memory allocations in clusterised many‐core architectures
2019
Power‐efficient architectures have become the most important feature required for future embedded systems. Modern designs, like those released on mobile devices, reveal that clusterisation is the way to improve energy efficiency. However, such architectures are still limited by the memory subsystem (i.e. memory latency problems). This work investigates an alternative approach that exploits on‐chip data locality to a large extent, through distributed shared memory systems that permit efficient reuse of on‐chip mapped data in clusterised many‐core architectures. First, this work reviews the current literature on memory allocations and explores the limitations of cluster‐based many‐core architectures. Then, several memory allocations are introduced and benchmarked scalability, performance and energy‐wise against the conventional centralised shared memory solution in order to reveal which memory allocation is the most appropriate for future mobile architectures. The results show that distributed shared memory allocations bring performance gains and opportunities to reduce energy consumption.
Journal Article
Scaling out NUMA-Aware Applications with RDMA-Based Distributed Shared Memory
2019
The multicore evolution has stimulated renewed interests in scaling up applications on shared-memory multiprocessors, significantly improving the scalability of many applications. But the scalability is limited within a single node; therefore programmers still have to redesign applications to scale out over multiple nodes. This paper revisits the design and implementation of distributed shared memory (DSM) as a way to scale out applications optimized for non-uniform memory access (NUMA) architecture over a well-connected cluster. This paper presents MAGI, an efficient DSM system that provides a transparent shared address space with scalable performance on a cluster with fast network interfaces. MAGI is unique in that it presents a NUMA abstraction to fully harness the multicore resources in each node through hierarchical synchronization and memory management. MAGI also exploits the memory access patterns of big-data applications and leverages a set of optimizations for remote direct memory access (RDMA) to reduce the number of page faults and the cost of the coherence protocol. MAGI has been implemented as a user-space library with pthread-compatible interfaces and can run existing multithreaded applications with minimized modifications. We deployed MAGI over an 8-node RDMAenabled cluster. Experimental evaluation shows that MAGI achieves up to 9.25x speedup compared with an unoptimized implementation, leading to a scalable performance for large-scale data-intensive applications.
Journal Article
Neuronal message passing using Mean-field, Bethe, and Marginal approximations
by
Parr, Thomas
,
Markovic, Dimitrije
,
Kiebel, Stefan J.
in
631/378/116/1925
,
631/553/2714
,
Bayesian analysis
2019
Neuronal computations rely upon local interactions across synapses. For a neuronal network to perform inference, it must integrate information from locally computed messages that are propagated among elements of that network. We review the form of two popular (Bayesian) message passing schemes and consider their plausibility as descriptions of inference in biological networks. These are variational message passing and belief propagation – each of which is derived from a free energy functional that relies upon different approximations (mean-field and Bethe respectively). We begin with an overview of these schemes and illustrate the form of the messages required to perform inference using Hidden Markov Models as generative models. Throughout, we use factor graphs to show the form of the generative models and of the messages they entail. We consider how these messages might manifest neuronally and simulate the inferences they perform. While variational message passing offers a simple and neuronally plausible architecture, it falls short of the inferential performance of belief propagation. In contrast, belief propagation allows exact computation of marginal posteriors at the expense of the architectural simplicity of variational message passing. As a compromise between these two extremes, we offer a third approach – marginal message passing – that features a simple architecture, while approximating the performance of belief propagation. Finally, we link formal considerations to accounts of neurological and psychiatric syndromes in terms of aberrant message passing.
Journal Article
λHive: Formal Semantics of an Edge Computing Model Based on JavaScript
2022
Edge computing is a paradigm that brings computation and data storage closer to the location where it is needed to improve response times and save bandwidth. It applies virtualization technology that makes it easier to deploy and run a wider range of applications on the edge servers and take advantage of largely unused computational resources. This article describes the design and formalization of Hive, a distributed shared memory model that can be transparently integrated with JavaScript using a standard out of the box runtime. To define such a model, a formal definition of the JavaScript language was used and extended to include modern capabilities and custom semantics. This extended model is used to prove that the distributed shared memory can operate on top of existing and unmodified web browsers, allowing the use of any computer and smartphone as a part of the distributed system. The proposed model guarantees the eventual synchronization of data across all the system and provides the possibility to have a stricter consistency using standard http operations.
Journal Article
Event‐based high throughput computing: A series of case studies on a massively parallel softcore machine
by
Beaumont, Jonathan
,
Luk, Wayne
,
McLachlan Bragg, Graeme
in
Communication
,
Field programmable gate arrays
,
Neural networks
2023
This paper introduces an event‐based computing paradigm, where workers only perform computation in response to external stimuli (events). This approach is best employed on hardware with many thousands of smaller compute cores with a fast, low‐latency interconnect, as opposed to traditional computers with fewer and faster cores. Event‐based computing is timely because it provides an alternative to traditional big computing, which suffers from immense infrastructural and power costs. This paper presents four case study applications, where an event‐based computing approach finds solutions to orders of magnitude more quickly than the equivalent traditional big compute approach, including problems in computational chemistry and condensed matter physics.
Journal Article
Impacts of Topology and Bandwidth on Distributed Shared Memory Systems
2023
As high-performance computing designs become increasingly complex, the importance of evaluating with simulation also grows. One of the most critical aspects of distributed computing design is the network architecture; different topologies and bandwidths have dramatic impacts on the overall performance of the system and should be explored to find the optimal design point. This work uses simulations developed to run in the existing Structural Simulation Toolkit v12.1.0 software framework to show that for a hypothetical test case, more complicated network topologies have better overall performance and performance improves with increased bandwidth, making them worth the additional design effort and expense. Specifically, the test case HyperX topology is shown to outperform the next best evaluated topology by thirty percent and is the only topology that did not experience diminishing performance gains with increased bandwidth.
Journal Article
The relationships between message passing, pairwise, Kermack–McKendrick and stochastic SIR epidemic models
by
Ball, Frank G.
,
Wilkinson, Robert R.
,
Sharkey, Kieran J.
in
Applications of Mathematics
,
Approximation
,
Communicable Diseases - epidemiology
2017
We consider a very general stochastic model for an SIR epidemic on a network which allows an individual’s infectious period, and the time it takes to contact each of its neighbours after becoming infected, to be correlated. We write down the message passing system of equations for this model and prove, for the first time, that it has a unique feasible solution. We also generalise an earlier result by proving that this solution provides a rigorous upper bound for the expected epidemic size (cumulative number of infection events) at any fixed time
t
>
0
. We specialise these results to a homogeneous special case where the graph (network) is symmetric. The message passing system here reduces to just four equations. We prove that cycles in the network inhibit the spread of infection, and derive important epidemiological results concerning the final epidemic size and threshold behaviour for a major outbreak. For Poisson contact processes, this message passing system is equivalent to a non-Markovian pair approximation model, which we show has well-known pairwise models as special cases. We show further that a sequence of message passing systems, starting with the homogeneous one just described, converges to the deterministic Kermack–McKendrick equations for this stochastic model. For Poisson contact and recovery, we show that this convergence is monotone, from which it follows that the message passing system (and hence also the pairwise model) here provides a better approximation to the expected epidemic size at time
t
>
0
than the Kermack–McKendrick model.
Journal Article
Rambo: a robust, reconfigurable atomic memory service for dynamic networks
by
Gilbert, Seth
,
Shvartsman, Alexander A.
,
Lynch, Nancy A.
in
Algorithms
,
Computer Communication Networks
,
Computer Hardware
2010
In this paper, we present R
ambo
, an algorithm for emulating a read/write distributed shared memory in a dynamic, rapidly changing environment. R
ambo
provides a highly reliable, highly available service, even as participants join, leave, and fail. In fact, the entire set of participants may change during an execution, as the initial devices depart and are replaced by a new set of devices. Even so, R
ambo
ensures that data stored in the distributed shared memory remains available and consistent. There are two basic techniques used by R
ambo
to tolerate dynamic changes. Over short intervals of time, replication suffices to provide fault-tolerance. While some devices may fail and leave, the data remains available at other replicas. Over longer intervals of time, R
ambo
copes with changing participants via
reconfiguration
, which incorporates newly joined devices while excluding devices that have departed or failed. The main novelty of R
ambo
lies in the combination of an efficient reconfiguration mechanism with a quorum-based replication strategy for read/write shared memory. The R
ambo
algorithm can tolerate a wide variety of aberrant behavior, including lost and delayed messages, participants with unsynchronized clocks, and, more generally, arbitrary asynchrony. Despite such behavior, R
ambo
guarantees that its data is stored consistency. We analyze the performance of R
ambo
during periods when the system is relatively well-behaved: messages are delivered in a timely fashion, reconfiguration is not too frequent, etc. We show that in these circumstances, read and write operations are efficient, completing in at most eight message delays.
Journal Article
A real-time operating system supporting distributed shared memory for embedded control systems
by
Chiba, Takahiro
,
Tamura, Yuji
,
Yokoyama, Takanori
in
Communication
,
Computer Communication Networks
,
Computer Science
2019
The paper presents a real-time operating system (RTOS) that provides a distributed shared memory (DSM) service for distributed embedded control systems. Model-based design is widely adopted in embedded control software design and the source code of software modules can be generated from a controller model. The generated software modules exchange their input and output values through shared variables. We develop a RTOS with a DSM service to provide a location-transparent environment, in which distributed software modules can exchange input and output values through the DSM. The RTOS is an extension to OSEK OS. We use a real-time network called FlexRay, which is based on a time division multiple access (TDMA) protocol. The consistency of the DSM is maintained according to the order of data transfer through FlexRay, not using inter-node synchronization. The worst case response time of the DSM is predictable if the FlexRay communication is well configured.
Journal Article