Catalogue Search | MBRL

Equivalence classes and conditional hardness in massively parallel computations

by Danupon, Nanongkai , Scquizzato Michele in Algorithms , Complexity , Data processing

2022

The Massively Parallel Computation (MPC) model serves as a common abstraction of many modern large-scale data processing frameworks, and has been receiving increasingly more attention over the past few years, especially in the context of classical graph problems. So far, the only way to argue lower bounds for this model is to condition on conjectures about the hardness of some specific problems, such as graph connectivity on promise graphs that are either one cycle or two cycles, usually called the one cycle versus two cycles problem. This is unlike the traditional arguments based on conjectures about complexity classes (e.g., P≠NP), which are often more robust in the sense that refuting them would lead to groundbreaking algorithms for a whole bunch of problems. In this paper we present connections between problems and classes of problems that allow the latter type of arguments. These connections concern the class of problems solvable in a sublogarithmic amount of rounds in the MPC model, denoted by MPC(o(logN)), and the standard space complexity classes L and NL, and suggest conjectures that are robust in the sense that refuting them would lead to many surprisingly fast new algorithms in the MPC model. We also obtain new conditional lower bounds, and prove new reductions and equivalences between problems in the MPC model. Specifically, our main results are as follows.Lower bounds conditioned on the one cycle versus two cycles conjecture can be instead argued under the L⊈MPC(o(logN)) conjecture: these two assumptions are equivalent, and refuting either of them would lead to o(logN)-round MPC algorithms for a large number of challenging problems, including list ranking, minimum cut, and planarity testing. In fact, we show that these problems and many others require asymptotically the same number of rounds as the seemingly much easier problem of distinguishing between a graph being one cycle or two cycles.Many lower bounds previously argued under the one cycle versus two cycles conjecture can be argued under an even more robust (thus harder to refute) conjecture, namely NL⊈MPC(o(logN)). Refuting this conjecture would lead to o(logN)-round MPC algorithms for an even larger set of problems, including all-pairs shortest paths, betweenness centrality, and all aforementioned ones. Lower bounds under this conjecture hold for problems such as perfect matching and network flow.

Journal Article

Share this book

Add to My Shelf

A Parallel Dissipation-Free and Dispersion-Optimized Explicit Time-Domain FEM for Large-Scale Room Acoustics Simulation

by Takeshi Okuzono , Takumi Yoshida , Kimihiro Sakagami in Absorbers , Accuracy , Acoustics

2022

Wave-based acoustics simulation methods such as finite element method (FEM) are reliable computer simulation tools for predicting acoustics in architectural spaces. Nevertheless, their application to practical room acoustics design is difficult because of their high computational costs. Therefore, we propose herein a parallel wave-based acoustics simulation method using dissipation-free and dispersion-optimized explicit time-domain FEM (TD-FEM) for simulating room acoustics at large-scale scenes. It can model sound absorbers with locally reacting frequency-dependent impedance boundary conditions (BCs). The method can use domain decomposition method (DDM)-based parallel computing to compute acoustics in large rooms at kilohertz frequencies. After validation studies of the proposed method via impedance tube and small cubic room problems including frequency-dependent impedance BCs of two porous type sound absorbers and a Helmholtz type sound absorber, the efficiency of the method against two implicit TD-FEMs was assessed. Faster computations and equivalent accuracy were achieved. Finally, acoustics simulation of an auditorium of 2271 m3 presenting a problem size of about 150,000,000 degrees of freedom demonstrated the practicality of the DDM-based parallel solver. Using 512 CPU cores on a parallel computer system, the proposed parallel solver can compute impulse responses with 3 s time length, including frequency components up to 3 kHz within 9000 s.

Journal Article

Share this book

Add to My Shelf

Permutation and Grouping Methods for Sharpening Gaussian Process Approximations

by Guinness, Joseph in Accuracy , Approximation , Comparative analysis

2018

Vecchia's approximate likelihood for Gaussian process parameters depends on how the observations are ordered, which has been cited as a deficiency. This article takes the alternative standpoint that the ordering can be tuned to sharpen the approximations. Indeed, the first part of the article includes a systematic study of how ordering affects the accuracy of Vecchia's approximation. We demonstrate the surprising result that random orderings can give dramatically sharper approximations than default coordinate-based orderings. Additional ordering schemes are described and analyzed numerically, including orderings capable of improving on random orderings. The second contribution of this article is a new automatic method for grouping calculations of components of the approximation. The grouping methods simultaneously improve approximation accuracy and reduce computational burden. In common settings, reordering combined with grouping reduces Kullback-Leibler divergence from the target model by more than a factor of 60 compared to ungrouped approximations with default ordering. The claims are supported by theory and numerical results with comparisons to other approximations, including tapered covariances and stochastic partial differential equations. Computational details are provided, including the use of the approximations for prediction and conditional simulation. An application to space-time satellite data is presented.

Journal Article

Share this book

Add to My Shelf

DISTRIBUTED LINEAR REGRESSION BY AVERAGING

by SHENG, YUE , DOBRIBAN, EDGAR in Approximation , Confidence intervals , Datasets

2021

Distributed statistical learning problems arise commonly when dealing with large datasets. In this setup, datasets are partitioned over machines, which compute locally, and communicate short messages. Communication is often the bottleneck. In this paper, we study one-step and iterative weighted parameter averaging in statistical linear models under data parallelism. We do linear regression on each machine, send the results to a central server and take a weighted average of the parameters. Optionally, we iterate, sending back the weighted average and doing local ridge regressions centered at it. How does this work compared to doing linear regression on the full data? Here, we study the performance loss in estimation and test error, and confidence interval length in high dimensions, where the number of parameters is comparable to the training data size. We find the performance loss in one-step weighted averaging, and also give results for iterative averaging. We also find that different problems are affected differently by the distributed framework. Estimation error and confidence interval length increases a lot, while prediction error increases much less. We rely on recent results from random matrix theory, where we develop a new calculus of deterministic equivalents as a tool of broader interest.

Journal Article

Share this book

Add to My Shelf

Development of a 3D Hybrid Finite-Discrete Element Simulator Based on GPGPU-Parallelized Computation for Modelling Rock Fracturing Under Quasi-Static and Dynamic Loading Conditions

by Fukuda Daisuke , Liu, Hongyuan , Fujii Yoshiaki in Agreements , Algorithms , Compressive strength

2020

As a state-of-the-art computational method for simulating rock fracturing and fragmentation, the combined finite-discrete element method (FDEM) has become widely accepted since Munjiza (2004) published his comprehensive book of FDEM. This study developed a general-purpose graphic-processing-unit (GPGPU)-parallelized FDEM using the compute unified device architecture C/C ++ based on the authors’ former sequential two-dimensional (2D) and three-dimensional (3D) Y-HFDEM IDE (integrated development environment) code. The theory and algorithm of the GPGPU-parallelized 3D Y-HFDEM IDE code are first introduced by focusing on the implementation of the contact detection algorithm, which is different from that in the sequential code, contact damping and contact friction. 3D modelling of the failure process of limestone under quasi-static loading conditions in uniaxial compressive strength (UCS) tests and Brazilian tensile strength (BTS) tests are then conducted using the GPGPU-parallelized 3D Y-HFDEM IDE code. The 3D FDEM modelling results show that mixed-mode I–II failures are the dominant failure mechanisms along the shear and splitting failure planes in the UCS and BTS models, respectively, with unstructured meshes. Pure mode I splitting failure planes and pure mode II shear failure planes are only possible in the UCS and BTS models, respectively, with structured meshes. Subsequently, 3D modelling of the dynamic fracturing of marble in dynamic Brazilian tests with a split Hopkinson pressure bar (SHPB) apparatus is conducted using the GPGPU-parallelized 3D HFDEM IDE code considering the entire SHPB testing system. The modelled failure process, final fracture pattern and time histories of the dynamic compressive wave, reflective tensile wave and transmitted compressive wave are compared quantitatively and qualitatively with those from experiments, and good agreements are achieved between them. The computing performance analysis shows the GPGPU-parallelized 3D HFDEM IDE code is 284 times faster than its sequential version and can achieve the computational complexity of O(N). The results demonstrate that the GPGPU-parallelized 3D Y-HFDEM IDE code is a valuable and powerful numerical tool for investigating rock fracturing under quasi-static and dynamic loading conditions in rock engineering applications although very fine elements with maximum element size no bigger than the length of the fracture process zone must be used in the area where fracturing process is modelled.

Journal Article

Share this book

Add to My Shelf

Parallel hybrid extragradient methods for pseudomonotone equilibrium problems and nonexpansive mappings

by Muu, Le Dung , Van Hieu, Dang , Anh, Pham Ky in Algebra , Algorithms , Computation

2016

In this paper we propose and analyze three parallel hybrid extragradient methods for finding a common element of the set of solutions of equilibrium problems involving pseudomonotone bifunctions and the set of fixed points of nonexpansive mappings in a real Hilbert space. Based on parallel computation we can reduce the overall computational effort under widely used conditions on the bifunctions and the nonexpansive mappings. A simple numerical example is given to illustrate the proposed parallel algorithms.

Journal Article

Share this book

Add to My Shelf

Parallel Implementation of Dispersive Tsunami Wave Modeling with a Nesting Algorithm for the 2011 Tohoku Tsunami

by Ando, Kazuto , Kato, Toshihiro , Baba, Toshitaka in Algorithms , Boussinesq equations , Computation

2015

Because of improvements in offshore tsunami observation technology, dispersion phenomena during tsunami propagation have often been observed in recent tsunamis, for example the 2004 Indian Ocean and 2011 Tohoku tsunamis. The dispersive propagation of tsunamis can be simulated by use of the Boussinesq model, but the model demands many computational resources. However, rapid progress has been made in parallel computing technology. In this study, we investigated a parallelized approach for dispersive tsunami wave modeling. Our new parallel software solves the nonlinear Boussinesq dispersive equations in spherical coordinates. A variable nested algorithm was used to increase spatial resolution in the target region. The software can also be used to predict tsunami inundation on land. We used the dispersive tsunami model to simulate the 2011 Tohoku earthquake on the Supercomputer K. Good agreement was apparent between the dispersive wave model results and the tsunami waveforms observed offshore. The finest bathymetric grid interval was 2/9 arcsec (approx. 5 m) along longitude and latitude lines. Use of this grid simulated tsunami soliton fission near the Sendai coast. Incorporating the three-dimensional shape of buildings and structures led to improved modeling of tsunami inundation.

Journal Article

Share this book

Add to My Shelf

Imperative Process Algebra and Models of Parallel Computation

by Middelburg, Cornelis A in Algebra , Communication , Complexity

2024

Studies of issues related to computability and computational complexity involve the use of a model of computation. Central in such a model are computational processes. Processes of this kind can be described using an imperative process algebra based on ACP (Algebra of Communicating Processes). In this paper, it is investigated whether the imperative process algebra concerned can play a role in the field of models of computation. It is demonstrated that the process algebra is suitable to describe in a mathematically precise way models of computation corresponding to existing models based on sequential, asynchronous parallel, and synchronous parallel random access machines as well as time and work complexity measures for those models.

Journal Article

Share this book

Add to My Shelf

general construction for parallelizing Metropolis−Hastings algorithms

by Calderhead, Ben in Accuracy , Algorithms , Computational statistics

2014

Markov chain Monte Carlo methods (MCMC) are essential tools for solving many modern-day statistical and computational problems; however, a major limitation is the inherently sequential nature of these algorithms. In this paper, we propose a natural generalization of the Metropolis−Hastings algorithm that allows for parallelizing a single chain using existing MCMC methods. We do so by proposing multiple points in parallel, then constructing and sampling from a finite-state Markov chain on the proposed points such that the overall procedure has the correct target density as its stationary distribution. Our approach is generally applicable and straightforward to implement. We demonstrate how this construction may be used to greatly increase the computational speed and statistical efficiency of a variety of existing MCMC methods, including Metropolis-Adjusted Langevin Algorithms and Adaptive MCMC. Furthermore, we show how it allows for a principled way of using every integration step within Hamiltonian Monte Carlo methods; our approach increases robustness to the choice of algorithmic parameters and results in increased accuracy of Monte Carlo estimates with little extra computational cost. Significance Many computational problems in modern-day statistics are heavily dependent on Markov chain Monte Carlo (MCMC) methods. These algorithms allow us to evaluate arbitrary probability distributions; however, they are inherently sequential in nature due to the Markov property, which severely limits their computational speed. We propose a general approach that allows scalable parallelization of existing MCMC methods. We do so by defining a finite-state Markov chain on multiple proposals in a way that ensures asymptotic convergence to the correct stationary distribution. In example simulations, we demonstrate up to two orders of magnitude improvement in overall computational performance.

Journal Article

Share this book

Add to My Shelf

Component stability in low-space massively parallel computation

by Davies-Peck, Peter , Czumaj, Artur , Parter, Merav in Algorithms , Computation , Lower bounds

2024

In this paper, we study the power and limitations of component-stable algorithms in the low-space model of massively parallel computation (MPC). Recently Ghaffari, Kuhn and Uitto (FOCS 2019) introduced the class of component-stable low-space MPC algorithms, which are, informally, those algorithms for which the outputs reported by the nodes in different connected components are required to be independent. This very natural notion was introduced to capture most (if not all) of the known efficient MPC algorithms to date, and it was the first general class of MPC algorithms for which one can show non-trivial conditional lower bounds. In this paper we enhance the framework of component-stable algorithms and investigate its effect on the complexity of randomized and deterministic low-space MPC. Our key contributions include: 1. We revise and formalize the lifting approach of Ghaffari, Kuhn and Uitto. This requires a very delicate amendment of the notion of component stability, which allows us to fill in gaps in the earlier arguments. 2. We also extend the framework to obtain conditional lower bounds for deterministic algorithms and fine-grained lower bounds that depend on the maximum degree Δ. 3. We demonstrate a collection of natural graph problems for which deterministic component-unstable algorithms break the conditional lower bound obtained for component-stable algorithms. This implies that, in the context of deterministic algorithms, component-stable algorithms are conditionally weaker than the component-unstable ones. 4. We also show that the restriction to component-stable algorithms has an impact in the randomized setting. We present a natural problem which can be solved in O(1) rounds by a component-unstable MPC algorithm, but requires Ω(loglog∗n) rounds for any component-stable algorithm, conditioned on the connectivity conjecture. Altogether our results imply that component-stability might limit the computational power of the low-space MPC model, at least in certain contexts, paving the way for improved upper bounds that escape the conditional lower bound setting of Ghaffari, Kuhn, and Uitto.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter