Catalogue Search | MBRL

Toward Real-Time Scalable Rigid-Body Simulation Using GPU-Optimized Collision Detection and Response

بواسطة Hong, Min , Sung, Nak-Jun في Accuracy , Algorithms , Analysis

2025

We propose a GPU-parallelized collision-detection and response framework for rigid-body dynamics, designed to efficiently handle densely populated 3D simulations in real time. The method combines explicit Euler time integration with a hierarchical Octree–AABB collision-detection scheme, enabling early pruning and localized refinement of contact checks. To resolve collisions, we employ a two-step response algorithm that integrates non-penetration correction and impulse-based velocity updates, stabilized through smoothing, clamping, and bias mechanisms. The framework is fully implemented within Unity3D using compute shaders and optimized GPU kernels. Experiments across multiple mesh models and increasing object counts demonstrate that the proposed hierarchical configuration significantly improves scalability and frame stability compared to conventional flat AABB methods. In particular, a two-level hierarchy achieves the best trade-off between spatial resolution and computational cost, maintaining interactive frame rates (≥30 fps) under high-density scenarios. These results suggest the practical applicability of our method to real-time simulation systems involving complex collision dynamics.

Journal Article

شارك هذا الكتاب

أضف إلى رفتي

Moving Towards Large-Scale Particle Based Fluid Simulation in Unity 3D

بواسطة Hong, Min , Waseem, Muhammad في Algorithms , Efficiency , Fluid dynamics

2025

Large-scale particle-based fluid simulations present significant computational challenges, particularly in achieving interactive frame rates while maintaining visual quality. Unity3D’s widespread adoption in game development, VR/AR applications, and scientific visualization creates a unique need for efficient fluid simulation within its ecosystem. This paper presents a GPU-accelerated Smoothed Particle Hydrodynamics (SPH) framework implemented in Unity3D that effectively addresses these challenges through several key innovations. Unlike previous GPU-accelerated SPH implementations that typically struggle with scaling beyond 100,000 particles while maintaining real-time performance, we introduce a novel fusion of Count Sort with Parallel Prefix Scan for spatial hashing that transforms the traditionally expensive O(n²) neighborhood search into an efficient O(n) operation, significantly outperforming traditional GPU sorting algorithms in particle-based simulations. Our implementation leverages a Structure of Arrays (SoA) memory layout, optimized for GPU compute shaders, achieving 30–45% improved computation throughput over traditional Array of Structures approaches. Performance evaluations demonstrate that our method achieves throughput rates up to 168,600 particles/ms while maintaining consistent 5.7–6.0 ms frame times across varying particle counts from 10,000 to 1,000,000. The framework maintains interactive frame rates (>30 FPS) with up to 500,000 particles and remains responsive even at 1 million particles. Collision rates approaching 1.0 indicate near-optimal hash distribution, while the adaptive time stepping mechanism adds minimal computational overhead (2–5%) while significantly improving simulation stability. These innovations enable real-time, large-scale fluid simulations with applications spanning visual effects, game development, and scientific visualization.

Journal Article

شارك هذا الكتاب

أضف إلى رفتي

The inversion of density structure by graphic processing unit (GPU) and identification of igneous rocks in Xisha area

بواسطة Wu, Shiguo , Zhang, Jian , Yu, Lei في Algorithms , Decomposition , Deep water

2014

Organic reefs, the targets of deep-water petroleum exploration, developed widely in Xisha area. However, there are concealed igneous rocks undersea, to which organic rocks have nearly equal wave impedance. So the igneous rocks have become interference for future exploration by having similar seismic reflection characteristics. Yet, the density and magnetism of organic reefs are very different from igneous rocks. It has obvious advantages to identify organic reefs and igneous rocks by gravity and magnetic data. At first, frequency decomposition was applied to the free-air gravity anomaly in Xisha area to obtain the 2D subdivision of the gravity anomaly and magnetic anomaly in the vertical direction. Thus, the distribution of igneous rocks in the horizontal direction can be acquired according to high-frequency field, low-frequency field, and its physical properties. Then, 3D forward modeling of gravitational field was carried out to establish the density model of this area by reference to physical properties of rocks based on former researches. Furthermore, 3D inversion of gravity anomaly by genetic algorithm method of the graphic processing unit (GPU) parallel processing in Xisha target area was applied, and 3D density structure of this area was obtained. By this way, we can confine the igneous rocks to the certain depth according to the density of the igneous rocks. The frequency decomposition and 3D inversion of gravity anomaly by genetic algorithm method of the GPU parallel processing proved to be a useful method for recognizing igneous rocks to its 3D geological position. So organic reefs and igneous rocks can be identified, which provide a prescient information for further exploration.

Journal Article

شارك هذا الكتاب

أضف إلى رفتي

Real-time ultrasound image reconstruction as an inverse problem on a GPU

بواسطة Maia, Joaquim M. , Bueno, Paulo R. , Zibetti, Marcelo V. W. في Acoustics , Algorithms , Computer Graphics

2020

Ultrasonic image reconstruction methods based on inverse problems have been shown to produce sharp, high-quality images using more information about the acquisition process in its processing. This improved reconstruction has high computational cost, usually requiring to solve large systems and making real-time imaging very difficult. Parallelizing the reconstruction using graphics processing units (GPU) can significantly accelerate this processing, but the amount of memory needed by current system models is high for current GPU capacity. This paper presents a new system model to halve this memory requirement; it exploits the symmetry of the point spread functions (PSF) of the system matrix that occurs when symmetric transducers are used for acquisition. In this case, only one of the two symmetric PSFs needs to be stored; the other function is produced by reordering the stored one. Thus, we can reconstruct ultrasound images that are twice as large, making real-time reconstruction on a GPU possible for this application.

Journal Article

شارك هذا الكتاب

أضف إلى رفتي

Parallel cloth simulation with effective collision detection for interactive AR application

بواسطة Hong, Min , Kim, Minsang , Nak-Jun Sung في Augmented reality , Cloth , Collision avoidance

2019

In this paper, we present a parallel cloth simulation with an efficient collision detection algorithm for interactive AR applications. In the first step of the proposed method, a set of sphere colliders is automatically defined for the 3D moving object colliding with a cloth model for the effective collision detection even on low-end devices. In the second step, the collision detection and handling between a set of sphere colliders and a cloth model are performed in parallel. We propose an efficient collision handling method based on a sphere to prevent the penetration of cloth into the object which can be happened due to the low mesh resolution of the cloth model. The proposed method was implemented as a plugin for Unity which is widely used for the real-time game development. Comparative experimental tests with the cloth object basically provided by Unity was performed in order to analyze the performance of the proposed method. As a result, we confirmed that the proposed method can reduce the cumbersome work to manually build colliders on a 3D model, and can effectively express more accurate and plausible behavior of the cloth that collides with the object.

Journal Article

شارك هذا الكتاب

أضف إلى رفتي

Parallel cloth simulation with GPGPU

بواسطة Hong, Min , Yoo-Joo, Choi , Young-Hwan, Choi في Algorithms , Central processing units , Cloth

2018

In a 3D simulation, numerous physically and numerically related calculations are required to represent an object realistically. The existing CPU (central processing unit) technology, however, is incapable of handling such a large computational amount in real time. With the recent hardware-technology advancements, the GPU (graphics processing unit) can be used not only for conventional rendering operations, but also for general-purpose computational functions. In this paper, a mass-spring system for which the CPU and GPU versions are tested under the PC and mobile environments wherein the GPGPU (general-purpose computing on GPUs) is applied is proposed. For this paper, a virtual cloth with a mass-spring system was freely dropped onto a table, and the CPU and GPU performances were compared. The computational GPU performances regarding the PC and mobile devices were improved by 9.41 times and 45.11 times, respectively, compared with the CPU. The proposed GPU mass-spring system was then implemented with an edge-centric algorithm and a node-centric algorithm. The edge-centric algorithm is divided into two parts as follows: one for the spring-force calculation and one for the node-position calculation. These two parts are combined into a single computational process for the node-centric algorithm. For this paper, the computational speeds of the two algorithms were measured. The node-centric algorithm is faster than the edge-centric algorithm under the PC environment, but the edge-centric algorithm is faster under the mobile environment.

Journal Article

شارك هذا الكتاب

أضف إلى رفتي

Enhancing the performance of the aggregated bit vector algorithm in network packet classification using GPU

بواسطة Tahouri, Razieh , Abbasi, Mahdi , Rafiee, Milad في Accelerators , Aggregated bit vector , Algorithms

2019

Packet classification is a computationally intensive, highly parallelizable task in many advanced network systems like high-speed routers and firewalls that enable different functionalities through discriminating incoming traffic. Recently, graphics processing units (GPUs) have been exploited as efficient accelerators for parallel implementation of software classifiers. The aggregated bit vector is a highly parallelizable packet classification algorithm. In this work, first we present a parallel kernel for running this algorithm on GPUs. Next, we adapt an asymptotic analysis method which predicts any empirical result of the proposed kernel. Experimental results not only confirm the efficiency of the proposed parallel kernel but also reveal the accuracy of the analysis method in predicting important trends in experimental results.

Journal Article

شارك هذا الكتاب

أضف إلى رفتي

Lattice Boltzmann Simulation of Solid Particles Motion in a Three Dimensional Flow using Smoothed Profile Method

بواسطة Jafari, S. , Gharibi, F. , Rahnama, M. في Computational fluid dynamics , Computer applications , Flow simulation

2017

Three-dimensional particulate flow has been simulated using Lattice Boltzmann Method (LBM). Solid-fluid interaction was modeled based on Smoothed Profile Method (SPM) (Jafari et. al, Lattice-Boltzmann method combined with smoothed-profile method for particulate suspensions, Phys. Rev. E, 2011). In this paper a GPU code based on three-dimensional lattice Boltzmann method and smoothed profile method has been prepared due to the ability of SPM-LBM to perform locally and in parallel mode. Results obtained for sedimentation of one and two spherical particles as well as their behavior in shear flow showed excellent correspondence with previous published works. Computations for a large number of particles sedimentation showed that combination of LBM and SPM on a GPU platform can be considered as an efficient and promising computational frame work in particulate flow simulations.

Journal Article

شارك هذا الكتاب

أضف إلى رفتي

Machine Learning and Deep Learning frameworks and libraries for large-scale data mining: a survey

بواسطة Bobák, Martin , Tran, Viet , Dlugolinsky, Stefan في Algorithms , Artificial intelligence , Big Data

2019

The combined impact of new computing resources and techniques with an increasing avalanche of large datasets, is transforming many research areas and may lead to technological breakthroughs that can be used by billions of people. In the recent years, Machine Learning and especially its subfield Deep Learning have seen impressive advances. Techniques developed within these two fields are now able to analyze and learn from huge amounts of real world examples in a disparate formats. While the number of Machine Learning algorithms is extensive and growing, their implementations through frameworks and libraries is also extensive and growing too. The software development in this field is fast paced with a large number of open-source software coming from the academy, industry, start-ups or wider open-source communities. This survey presents a recent time-slide comprehensive overview with comparisons as well as trends in development and usage of cutting-edge Artificial Intelligence software. It also provides an overview of massive parallelism support that is capable of scaling computation effectively and efficiently in the era of Big Data.

Journal Article

شارك هذا الكتاب

أضف إلى رفتي

Full reconstruction of a 14-qubit state within four hours

بواسطة Li, Li , Xiang, Guo-Yong , Nori, Franco في 14-qubit , Algorithms , Complexity

2016

Full quantum state tomography (FQST) plays a unique role in the estimation of the state of a quantum system without a priori knowledge or assumptions. Unfortunately, since FQST requires informationally (over)complete measurements, both the number of measurement bases and the computational complexity of data processing suffer an exponential growth with the size of the quantum system. A 14-qubit entangled state has already been experimentally prepared in an ion trap, and the data processing capability for FQST of a 14-qubit state seems to be far away from practical applications. In this paper, the computational capability of FQST is pushed forward to reconstruct a 14-qubit state with a run time of only 3.35 hours using the linear regression estimation (LRE) algorithm, even when informationally overcomplete Pauli measurements are employed. The computational complexity of the LRE algorithm is first reduced from ∼1019 to ∼1015 for a 14-qubit state, by dropping all the zero elements, and its computational efficiency is further sped up by fully exploiting the parallelism of the LRE algorithm with parallel Graphic Processing Unit (GPU) programming. Our result demonstrates the effectiveness of using parallel computation to speed up the postprocessing for FQST, and can play an important role in quantum information technologies with large quantum systems.

Journal Article

شارك هذا الكتاب

أضف إلى رفتي

محدد اللغة

MBRLGlobalSearch

محدد اللغة

Catalogue Search | MBRL

نتائج البحث

استكشف المجموعة الواسعة من العناوين المتاحة.

MBRLSearchResults

MBRLHappinessMeter