Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Discipline
      Discipline
      Clear All
      Discipline
  • Is Peer Reviewed
      Is Peer Reviewed
      Clear All
      Is Peer Reviewed
  • Item Type
      Item Type
      Clear All
      Item Type
  • Subject
      Subject
      Clear All
      Subject
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
      More Filters
      Clear All
      More Filters
      Source
    • Language
125 result(s) for "large-scale parallel computing"
Sort by:
Scalability of Viscoelastic Fluid Solvers Based on OpenFOAM-PETSc Framework in Large-Scale Parallel Computing
Enormous advances in physics of complex fluids/soft matter over last decades have rapidly transformed traditional industrial sectors in foods, personal care products, pharmaceuticals, paints, lubricants, ceramics, polymers, liquid crystals, high performance fibers, oil exploration and production into a digital era of formulation design and precision control over processing conditions from molecular viewpoint, and fertilizing a new industrial revolution. Development of high performance viscoelastic fluid solvers is of great significance for large scale digital manufacturing. In the present work, a portable and extensible scientific computing (PETSc) toolbox has been successfully integrated into the popular OpenFOAM CFD toolbox for carrying out large scale parallel computing of Turbulent Drag Reduction ( TDR ) and Elastic Turbulence ( ET ) in the isotropic turbulence flow. Its scalability has been evaluated and compared with the scalability of the OpenFOAM based viscoelastic fluid solvers. The results show that there are significant improvements.
Deploying and Optimizing Embodied Simulations of Large-Scale Spiking Neural Networks on HPC Infrastructure
Simulating the brain-body-environment trinity in closed loop is an attractive proposal to investigate how perception, motor activity and interactions with the environment shape brain activity, and vice versa. The relevance of this embodied approach, however, hinges entirely on the modeled complexity of the various simulated phenomena. In this article, we introduce a software framework that is capable of simulating large-scale, biologically realistic networks of spiking neurons embodied in a biomechanically accurate musculoskeletal system that interacts with a physically realistic virtual environment. We deploy this framework on the high performance computing resources of the EBRAINS research infrastructure and we investigate the scaling performance by distributing computation across an increasing number of interconnected compute nodes. Our architecture is based on requested compute nodes as well as persistent virtual machines; this provides a high-performance simulation environment that is accessible to multi-domain users without expert knowledge, with a view to enable users to instantiate and control simulations at custom scale via a web-based Graphical User Interface. Our simulation environment, entirely open source, is based on the Neurorobotics Platform developed in the context of the Human Brain Project, and the NEST simulator. We characterize the capabilities of our parallelized architecture for large-scale embodied brain simulations through two benchmark experiments, by investigating the effects of scaling compute resources on performance defined in terms of experiment runtime, brain instantiation and simulation time. The first benchmark is based on a large-scale balanced network, while the second one is a multi-region embodied brain simulation consisting of more than a million neurons and a billion synapses. Both benchmarks clearly show how scaling compute resources improve the aforementioned performance metrics in a near-linear fashion. The second benchmark in particular is indicative of both the potential and limitations of a highly distributed simulation in terms of a trade-off between computation speed and resource cost. Our simulation architecture is being prepared to be accessible for everyone as an EBRAINS service, thereby offering a community-wide tool with a unique workflow that should provide momentum to the investigation of closed-loop embodiment within the computational neuroscience community.
Bottom-Up Construction and 2:1 Balance Refinement of Linear Octrees in Parallel
In this article, we propose new parallel algorithms for the construction and 2:1 balance refinement of large linear octrees on distributed memory machines. Such octrees are used in many problems in computational science and engineering, e.g., object representation, image analysis, unstructured meshing, finite elements, adaptive mesh refinement, and N-body simulations. Fixed-size scalability and isogranular analysis of the algorithms using an MPI-based parallel implementation was performed on a variety of input data and demonstrated good scalability for different processor counts (1 to 1024 processors) on the Pittsburgh Supercomputing Center's TCS-1 AlphaServer. The results are consistent for different data distributions. Octrees with over a billion octants were constructed and balanced in less than a minute on 1024 processors. Like other existing algorithms for constructing and balancing octrees, our algorithms have$\\mathcal{O}(N\\log N)$work and$\\mathcal{O}(N)$storage complexity. Under reasonable assumptions on the distribution of octants and the work per octant, the parallel time complexity is$\\mathcal{O}(\\frac{N}{n_p}\\log(\\frac{N}{n_p})+n_p\\log n_p)$ , where$N$is the size of the final linear octree and$n_p$is the number of processors.
Shear Decoupled Parallel Scalable Preconditioners for Nonlinear Thermo-Mechanical Coupled Contact Applications
It is necessary to solve the thermo-mechanical coupled contact problem in structural mechanical analysis, such as the dam structural analysis. Because of the complexity of coupled models, it is very difficult to solve the discretized system in structural mechanical analysis. At the same time, for real applications such as the dam structural analysis, the simulation domain has complex structures, which result in a number of mesh elements more than 10 8 for high resolution simulations. Therefore, the scale of the discretized system is very large. In this paper, the discretized themo-mechanical coupling contact system on hundreds million unstructured meshes is parallel solved by the Newton–Krylov method, in which the efficiency of the Krylov methods is strongly dependent on the preconditioning. An efficient preconditioning method is constructed for the themo-mechanical coupling contact problem. Three steps are used to construct the preconditioner. Firstly, for the mechanical problem, the mechanical effect is analyzed for the dam structural analysis, and a preconditioner is constructed for the elasticity problem by omitting the shearing effect. As the dominant material in a dam structural analysis is rock-soil, and rock-soil exhibits anti-shearing property, it is reasonable to omit the shearing effect for constructing the preconditioner. Furthermore, a preconditioner is constructed for the thermo-mechanical model by omitting the coupling between the thermal and mechanical effectiveness in the model. The preconditioner has a block diagonal structure, with each block being a diffusion operator. It is suitable for large scale parallel computing since each block can be solved independently. Furthermore, since each block is a diffusion operator, a multi-grid method can be employed to effectively solve each block equation. Finally, based on the preconditioning of the thermo-mechanical model, a preconditioner is constructed for the thermo-mechanical coupling contact problem by combining the dual motar method for contact problems. Numerical results show that the preconditioning method is very effective, and the convergence rate of the Krylov method can be improved dramatically when it is used to solve the themo-mechanical coupling contact problem.
Survey on memory management techniques in heterogeneous computing systems
A major issue faced by data scientists today is how to scale up their processing infrastructure to meet the challenge of big data and high-performance computing (HPC) workloads. With today's HPC domain, it is required to connect multiple graphics processing units (GPUs) to accomplish large-scale parallel computing along with CPUs. Data movement between the processor and on-chip or off-chip memory creates a major bottleneck in overall system performance. The CPU/GPU processes all the data on a computer's memory and hence the speed of the data movement to/from memory and the size of the memory affect computer speed. During memory access by any processing element, the memory management unit (MMU) controls the data flow of the computer's main memory and impacts the system performance and power. Change in dynamic random access memory (DRAM) architecture, integration of memory-centric hardware accelerator in the heterogeneous system and Processing-in-Memory (PIM) are the techniques adopted from all the available shared resource management techniques to maximise the system throughput. This survey study presents an analysis of various DRAM designs and their performances. The authors also focus on the architecture, functionality, and performance of different hardware accelerators and PIM systems to reduce memory access time. Some insights and potential directions toward enhancements to existing techniques are also discussed. The requirement of fast, reconfigurable, self-adaptive memory management schemes in the high-speed processing scenario motivates us to track the trend. An effective MMU handles memory protection, cache control and bus arbitration associated with the processors.
Performance Analysis of Homogeneous On-Chip Large-Scale Parallel Computing Architectures for Data-Parallel Applications
On-chip computing platforms are evolving from single-core bus-based systems to many-core network-based systems, which are referred to as On-chip Large-scale Parallel Computing Architectures (OLPCs) in the paper. Homogenous OLPCs feature strong regularity and scalability due to its identical cores and routers. Data-parallel applications have their parallel data subsets that are handled individually by the same program running in different cores. Therefore, data-parallel applications are able to obtain good speedup in homogenous OLPCs. The paper addresses modeling the speedup performance of homogeneous OLPCs for data-parallel applications. When establishing the speedup performance model, the network communication latency and the ways of storing data of data-parallel applications are modeled and analyzed in detail. Two abstract concepts (equivalent serial packet and equivalent serial communication) are proposed to construct the network communication latency model. The uniform and hotspot traffic models are adopted to reflect the ways of storing data. Some useful suggestions are presented during the performance model’s analysis. Finally, three data-parallel applications are performed on our cycle-accurate homogenous OLPC experimental platform to validate the analytic results and demonstrate that our study provides a feasible way to estimate and evaluate the performance of data-parallel applications onto homogenous OLPCs.
Scalability Study on Large-Scale Parallel Finite Element Computing in PANDA Frame
A Finite-element parallel computing frame—PANDA and its implementation processes are introduced. To validate the parallel performance of the PANDA frame, a series of tests were carried out to obtain the computing scale and the speedup ratios. First, three different large-scale freedom degree models (i.e. 1.83 million, 7 million and 10 million) of a typical engineering clamp were created in MSC.Patran and were translated into geometric-grid files that can be identified in PANDA frame. Second, Linear static parallel computations of the three cases were successfully carried out on large parallel computers with preconditioned conjugate gradient methods in PANDA frame. The speedup ratios of the three cases were obtained with a maximum process number of 64. The results show that the PANDA frame is competent for carrying out large-scale parallel computing of 10 million freedom degrees. In each scale,the parallel computing is nearly linearly accelerated along with the increase of process numbers, moreover, a super-linear speedup appears in some cases. The speedup curves show that the linear degree increases when the computing scale enlarges. The influence of different communication bandwidths on computing efficiency was also discussed. All the testing results indicate that the PANDA frame has excellent parallel performance and favorable computing scalability.
Machine Learning and Deep Learning frameworks and libraries for large-scale data mining: a survey
The combined impact of new computing resources and techniques with an increasing avalanche of large datasets, is transforming many research areas and may lead to technological breakthroughs that can be used by billions of people. In the recent years, Machine Learning and especially its subfield Deep Learning have seen impressive advances. Techniques developed within these two fields are now able to analyze and learn from huge amounts of real world examples in a disparate formats. While the number of Machine Learning algorithms is extensive and growing, their implementations through frameworks and libraries is also extensive and growing too. The software development in this field is fast paced with a large number of open-source software coming from the academy, industry, start-ups or wider open-source communities. This survey presents a recent time-slide comprehensive overview with comparisons as well as trends in development and usage of cutting-edge Artificial Intelligence software. It also provides an overview of massive parallelism support that is capable of scaling computation effectively and efficiently in the era of Big Data.
Topology optimization using PETSc: An easy-to-use, fully parallel, open source topology optimization framework
This paper presents a flexible framework for parallel and easy-to-implement topology optimization using the Portable and Extendable Toolkit for Scientific Computing (PETSc). The presented framework is based on a standardized, and freely available library and in the published form it solves the minimum compliance problem on structured grids, using standard FEM and filtering techniques. For completeness a parallel implementation of the Method of Moving Asymptotes is included as well. The capabilities are exemplified by minimum compliance and homogenization problems. In both cases the unprecedented fine discretization reveals new design features, providing novel insight. The code can be downloaded from www.topopt.dtu.dk/PETSc .
Extremely Scalable Spiking Neuronal Network Simulation Code: From Laptops to Exascale Computers
State-of-the-art software tools for neuronal network simulations scale to the largest computing systems available today and enable investigations of large-scale networks of up to 10 % of the human cortex at a resolution of individual neurons and synapses. Due to an upper limit on the number of incoming connections of a single neuron, network connectivity becomes extremely sparse at this scale. To manage computational costs, simulation software ultimately targeting the brain scale needs to fully exploit this sparsity. Here we present a two-tier connection infrastructure and a framework for directed communication among compute nodes accounting for the sparsity of brain-scale networks. We demonstrate the feasibility of this approach by implementing the technology in the NEST simulation code and we investigate its performance in different scaling scenarios of typical network simulations. Our results show that the new data structures and communication scheme prepare the simulation kernel for post-petascale high-performance computing facilities without sacrificing performance in smaller systems.