Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Discipline
      Discipline
      Clear All
      Discipline
  • Is Peer Reviewed
      Is Peer Reviewed
      Clear All
      Is Peer Reviewed
  • Reading Level
      Reading Level
      Clear All
      Reading Level
  • Content Type
      Content Type
      Clear All
      Content Type
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
      More Filters
      Clear All
      More Filters
      Item Type
    • Is Full-Text Available
    • Subject
    • Publisher
    • Source
    • Donor
    • Language
    • Place of Publication
    • Contributors
    • Location
14,051 result(s) for "parallel algorithms"
Sort by:
A parallelizable method for two-dimensional wave propagation using subdomains in time with Multigrid and Waveform Relaxation
In this paper we compare the implicit schemes for the solution of the two-dimensional wave equation using Singlegrid and Multigrid methods. The discretization is performed using the Finite Difference Method, weighted in time by an established parameter. The parallelization of the algorithms is ensured by employing the Waveform Relaxation method, where numerical stability is achieved by applying the method of subdomains in time. The primary innovation of this work lies in the development of a high-order method that harnesses the parallelizability and robustness of the Multigrid method, enabling efficient solutions to the 2D wave equation. These methods also effectively mitigate oscillations that would otherwise significantly increase the maximum residual, a concern arising from the application of the standard Waveform Relaxation method.
Parallel Optimization of Program Instructions Using Genetic Algorithms
This paper describes an efficient solution to parallelize software program instructions, regardless of the programming language in which they are written. We solve the problem of the optimal distribution of a set of instructions on available processors. We propose a genetic algorithm to parallelize computations, using evolution to search the solution space. The stages of our proposed genetic algorithm are: The choice of the initial population and its representation in chromosomes, the crossover, and the mutation operations customized to the problem being dealt with. In this paper, genetic algorithms are applied to the entire search space of the parallelization of the program instructions problem. This problem is NP-complete, so there are no polynomial algorithms that can scan the solution space and solve the problem. The genetic algorithm-based method is general and it is simple and efficient to implement because it can be scaled to a larger or smaller number of instructions that must be parallelized. The parallelization technique proposed in this paper was developed in the C# programming language, and our results confirm the effectiveness of our parallelization method. Experimental results obtained and presented for different working scenarios confirm the theoretical results, and they provide insight on how to improve the exploration of a search space that is too large to be searched exhaustively.
Efficient simulation of neural development using shared memory parallelization
The Neural Development Simulator, NeuroDevSim, is a Python module that simulates the most important aspects of brain development: morphological growth, migration, and pruning. It uses an agent-based modeling approach inherited from the NeuroMaC software. Each cycle has agents called fronts execute model-specific code. In the case of a growing dendritic or axonal front, this will be a choice between extension, branching, or growth termination. Somatic fronts can migrate to new positions and any front can be retracted to prune parts of neurons. Collision detection prevents new or migrating fronts from overlapping with existing ones. NeuroDevSim is a multi-core program that uses an innovative shared memory approach to achieve parallel processing without messaging. We demonstrate linear strong parallel scaling up to 96 cores for large models and have run these successfully on 128 cores. Most of the shared memory parallelism is achieved without memory locking. Instead, cores have only write privileges to private sections of arrays, while being able to read the entire shared array. Memory conflicts are avoided by a coding rule that allows only active fronts to use methods that need writing access. The exception is collision detection, which is needed to avoid the growth of physically overlapping structures. For collision detection, a memory-locking mechanism was necessary to control access to grid points that register the location of nearby fronts. A custom approach using a serialized lock broker was able to manage both read and write locking. NeuroDevSim allows easy modeling of most aspects of neural development for models simulating a few complex or thousands of simple neurons or a mixture of both.
Performance evaluation of GPU-based parallel sorting algorithms
Sorting can be approached in two main ways: sequentially and in parallel. In sequential sorting, data is processed in a single-threaded manner, which can be slow for large datasets. However, parallel sorting divides the task across multiple processing units, enabling faster results by processing data simultaneously. Furthermore, Compute Unified Device Architecture (CUDA) technology enables developers to leverage GPU power for general-purpose parallel computing, significantly accelerating tasks like sorting. This paper investigates the GPU-based parallelization of merge sort (MS), quick sort (QS), bubble sort (BS), radix top-k selection sort (RS), and slow sort (SS) presenting optimized algorithms designed for efficient sorting of large datasets using modern GPUs. The primary objective is to evaluate the performance of these algorithms on GPUs utilizing CUDA, with a focus on analyzing both parallel time complexity and space complexity across various data types. Experiments are conducted on four dataset scenarios: randomly generated data, reverse-sorted data, already-sorted data, and nearly-sorted data. Also, the performance of GPU-accelerated implementations is compared with their sequential counterparts to assess improvements in computational efficiency and scalability. Earlier GPU-based generations of this type typically achieved acceleration rates between 2× and 9× over scalar CPU code. With newer GPU enhancements, including parallel-aware primitives and radix- or merge-optimized operations, acceleration rates have seen significant improvement. Our experiments indicate that Radix Sort based on GPUs achieves a significant speedup of approximately 50× (sequential: 240.8 ms, parallel: 4.83 ms) on 10 million random sort elements. Quick Sort and Merge Sort have 97× and 103× speedups, respectively (Quick: 1461.97 ms vs. 15.1 ms; Merge: 2212.33 ms vs. 21.4 ms). Bubble Sort, while significantly improving in parallel (123,321.9 ms to 7377.8 ms for an ≈17× improvement), is considerably worse overall. Slow Sort demonstrates a moderate but consistent acceleration, reducing execution time from 74.07 ms in the sequential version to 3.99 ms on the GPU, yielding an ≈18.6× speedup. These experimental findings confirm that the new single-GPU implementations can get speedups ranging from 17× to over 100×, surpassing the typical gains reported in previous generations and comparable to or over rates of acceleration reported for cutting-edge parallel sorting algorithms in recent studies.
GPU parallel acceleration of transient simulations of open channel and pipe combined flows
Simulating the transient processes in complex water transmission system is time-consuming, and improving computational efficiency by means of parallelization on CPU clusters or even faster GPU platform is demanded. This paper proposes an approach to accelerate the transient simulations of open channel and pipe combined flows on single GPU chip. The Saint-Venant equations for open channel flows is solved by using the method of characteristics (MOC), whose inherent parallelism can be well exploited by GPU implementations in the thread-level parallelism structure of Compute Unified Device Architecture (CUDA). The sub-processes, including open channel computation, pipe flow computation and connecting boundary treatment, are implemented by different kernels. The procedures are first verified by analyzing the parallel computation efficiency of hydraulic transient processes in an open channel. Then the transient processes of a practical engineering project, which involves both open channel flow and pressurized pipe flow, are simulated. The GPU kernels are found to be memory bandwidth bounded, and the proposed single chip GPU parallel can achieve up to hundreds of speedup ratios compared to the sequential counterpart on single CPU chip.
Graph partitioning and graph clustering : 10th DIMACS Implementation Challenge Workshop, February 13-14, 2012, Georgia Institute of Technology, Atlanta, GA
Graph partitioning and graph clustering are ubiquitous subtasks in many applications where graphs play an important role. Generally speaking, both techniques aim at the identification of vertex subsets with many internal and few external edges. To name only a few, problems addressed by graph partitioning and graph clustering algorithms are: li>What are the communities within an (online) social network?How do I speed up a numerical simulation by mapping it efficiently onto a parallel computer?How must components be organised on a computer chip such that they can communicate efficiently with each other?What are the segments of a digital image?Which functions are certain genes (most likely) responsible for?The 10th DIMACS Implementation Challenge Workshop was devoted to determining realistic performance of algorithms where worst case analysis is overly pessimistic and probabilistic models are too unrealistic. Articles in the volume describe and analyse various experimental data with the goal of getting insight into realistic algorithm performance in situations where analysis fails. This book is published in cooperation with the Center for Discrete Mathematics and Theoretical Computer Science.
Parallel scientific computing
Parallel Scientific Computing Scientific computing has become an indispensable tool in numerous fields, such as physics, mechanics, biology, finance and industry. For example, it enables us, thanks to efficient algorithms adapted to current computers, to simulate, without the help of models or experimentations, the deflection of beams in bending, the sound level in a theater room or a fluid flowing around an aircraft wing. This book presents the scientific computing techniques applied to parallel computing for the numerical simulation of large-scale problems; these problems result from systems modeled by partial differential equations. Computing concepts will be tackled via examples. Implementation and programming techniques resulting from the finite element method will be presented for direct solvers, iterative solvers and domain decomposition methods, along with an introduction to MPI and OpenMP.
Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions
Low-rank matrix approximations, such as the truncated singular value decomposition and the rank-revealing QR decomposition, play a central role in data analysis and scientific computing. This work surveys and extends recent research which demonstrates that randomization offers a powerful tool for performing low-rank matrix approximation. These techniques exploit modern computational architectures more fully than classical methods and open the possibility of dealing with truly massive data sets. This paper presents a modular framework for constructing randomized algorithms that compute partial matrix decompositions. These methods use random sampling to identify a subspace that captures most of the action of a matrix. The input matrix is then compressed—either explicitly or implicitly—to this subspace, and the reduced matrix is manipulated deterministically to obtain the desired low-rank factorization. In many cases, this approach beats its classical competitors in terms of accuracy, robustness, and/or speed. These claims are supported by extensive numerical experiments and a detailed error analysis. The specific benefits of randomized techniques depend on the computational environment. Consider the model problem of finding the k dominant components of the singular value decomposition of an m × n matrix. (i) For a dense input matrix, randomized algorithms require O(mn log(k)) floating-point operations (flops) in contrast to O(mnk) for classical algorithms. (ii) For a sparse input matrix, the flop count matches classical Krylov subspace methods, but the randomized approach is more robust and can easily be reorganized to exploit multi-processor architectures. (iii) For a matrix that is too large to fit in fast memory, the randomized techniques require only a constant number of passes over the data, as opposed to O(k) passes for classical algorithms. In fact, it is sometimes possible to perform matrix approximation with a single pass over the data.
Implementation of Parallel Algorithm Technology for Time Series Data Mining
With the rapid development of computer technology, Internet technology and artificial intelligence technology, the amount of global data has exploded. However, the single-machine serial mode of traditional data mining cannot be directly transplanted to the cloud platform. Only by parallelizing and improving many classic data mining algorithms can the cloud computing platform and data mining be effectively combined. Therefore, it is of great significance to the research and implementation of parallel algorithm technology for time series data mining. The purpose of this paper is to study the research and implementation of parallel algorithm technology for time series data mining. This paper adopts the method of literature data, mathematical statistics, logic analysis and other research methods to study the parallel algorithm technology research and realization of time series data mining, mainly to make useful explorations of time series data mining and visualization technology. It embodies the design ideas of big data analysis tools, and finally reflects the power and market value of data analysis tools through the display of the platform. Research shows that running in the same data set and the same experimental environment, the improved parallel collaborative filtering algorithm ACF in this paper has higher time running efficiency than the parallel algorithm MCF based on the cooccurrence matrix, and in the case of larger data sets, the more obvious the time difference.