Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
35
result(s) for
"Ibrahim, Khaled Z"
Sort by:
Exploring temporal community evolution: algorithmic approaches and parallel optimization for dynamic community detection
by
Arifuzzaman, Shaikh
,
Ibrahim, Khaled Z.
,
Sattar, Naw Safrin
in
Algorithms
,
Community detection
,
Community evolution
2023
Dynamic (temporal) graphs are a convenient mathematical abstraction for many practical complex systems including social contacts, business transactions, and computer communications. Community discovery is an extensively used graph analysis kernel with rich literature for static graphs. However, community discovery in a dynamic setting is challenging for two specific reasons. Firstly, the notion of temporal community lacks a widely accepted formalization, and only limited work exists on understanding how communities emerge over time. Secondly, the added temporal dimension along with the sheer size of modern graph data necessitates new scalable algorithms. In this paper, we investigate how communities evolve over time based on several graph metrics under a temporal formalization. We compare six different algorithmic approaches for dynamic community detection for their quality and runtime. We identify that a vertex-centric (local) optimization method works as efficiently as the classical modularity-based methods. To its advantage, such local computation allows for the efficient design of parallel algorithms without incurring a significant parallel overhead. Based on this insight, we design a shared-memory parallel algorithm
DyComPar
, which demonstrates between 4 and 18 fold speed-up on a multi-core machine with 20 threads, for several real-world and synthetic graphs from different domains.
Journal Article
Unconventional nonlinear Hall effects in twisted multilayer 2D materials
by
Choi, Min
,
Diéguez, Adrián Perez
,
Xu, Qiang
in
639/301/1019/385
,
639/925/357/1018
,
Approximation
2025
We present the first investigation of unusual nonlinear Hall effects in twisted multilayer 2D materials. Contrary to expectations, our study shows that these nonlinear effects are not merely extensions of their monolayer counterparts. Instead, we find that stacking order and pairwise interactions between neighboring layers, mediated by Berry curvatures, play a pivotal role in shaping their collective nonlinear optical response. By combining large-scale Real-Time Time-Dependent Density Functional Theory (RT-TDDFT) simulations with model Hamiltonian analyses, we demonstrate a remarkable second-harmonic transverse response in hexagonal boron nitride four-layers, even in cases where the total Berry curvature cancels out. Furthermore, our symmetry analysis of the layered structures provides a simplified framework for predicting nonlinear responses in multilayer materials in general. Our investigation challenges the prevailing understanding of nonlinear optical responses in layered materials and opens new avenues for the design and development of advanced materials with tailored optical properties.
Journal Article
Performance analysis of deep learning workloads using roofline trajectories
by
Ibrahim, Khaled Z.
,
Javed, M. Haseeb
,
Lu, Xiaoyi
in
Algorithms
,
Artificial intelligence
,
Artificial neural networks
2019
Over the last decade, technologies derived from convolutional neural networks (CNNs) called Deep Learning applications, have revolutionized fields as diverse as cancer detection, self-driving cars, virtual assistants, etc. However, many users of such applications are not experts in Machine Learning itself. Consequently, there is limited knowledge among the community to run such applications in an optimized manner. The performance question for Deep Learning applications has typically been addressed by employing bespoke hardware (e.g., GPUs) better suited for such compute-intensive operations. However, such a degree of performance is only accessibly at increasingly high financial costs leaving only big corporations and governments with resources sufficient enough to employ them at a large scale. As a result, an average user is only left with access to commodity clusters with, in many cases, only CPUs as the sole processing element. For such users to make effective use of resources at their disposal, concerted efforts are necessary to figure out optimal hardware and software configurations. This study is one such step in this direction as we use the Roofline model to perform a systematic analysis of representative CNN models and identify opportunities for black box and application-aware optimizations. Using the findings from our study, we are able to obtain up to 3.5
×
speedup compared to vanilla TensorFlow with default configurations.
Journal Article
Implementation of fixed-nuclei polyatomic MCTDHF capability and the future with nuclear motion
by
Haxton, Daniel J
,
Vecharynski, Eugene
,
Rescigno, Thomas N
in
Absorption cross sections
,
Basis functions
,
Cartesian coordinates
2015
Synopsis We discuss the implementation (https://commons.lbl.gov/display/csd/LBNL-AMO-MCTDHF) of Multiconfiguration Time-Dependent Hartree-Fock for polyatomic molecules using a Cartesian product grid of sinc basis functions, and present absorption cross sections and other results calculated with it.
Journal Article
Velocity-gauge real-time time-dependent density functional tight-binding for large-scale condensed matter systems
2024
We present a new velocity-gauge real-time, time-dependent density functional tight-binding (VG-rtTDDFTB) implementation in the open-source DFTB+ software package (https://dftbplus.org) for probing electronic excitations in large, condensed matter systems. Our VG-rtTDDFTB approach enables real-time electron dynamics simulations of large, periodic, condensed matter systems containing thousands of atoms with a favorable computational scaling as a function of system size. We provide computational details and benchmark calculations to demonstrate its accuracy and computational parallelizability on a variety of large material systems. As a representative example, we calculate laser-induced electron dynamics in a 512-atom amorphous silicon supercell to highlight the large periodic systems that can be examined with our implementation. Taken together, our VG-rtTDDFTB approach enables new electron dynamics simulations of complex systems that require large periodic supercells, such as crystal defects, complex surfaces, nanowires, and amorphous materials.
Dynamic Mode Decomposition for Extrapolating Non-equilibrium Green's Functions Dynamics
2023
The HF-GKBA offers an approximate numerical procedure for propagating the two-time non-equilibrium Green's function(NEGF). Here we compare the HF-GKBA to exact results for a variety of systems with long and short-range interactions, different two-body interaction strengths and various non-equilibrium preparations. We find excellent agreement between the HF-GKBA and exact time evolution in models when more realistic long-range exponentially decaying interactions are considered. This agreement persists for long times and for intermediate to strong interaction strengths. In large systems, HF-GKBA becomes prohibitively expensive for long-time evolutions. For this reason, look at the use of dynamical mode decomposition(DMD) to reconstruct long-time NEGF trajectories from a sample of the initial trajectory. Using no more than 16\\% of the total time evolution we reconstruct the total trajectory with high fidelity. Our results show the potential for DMD to be used in conjunction with HF-GKBA to calculate long time trajectories in large-scale systems.
An Evaluation of Real-time Adaptive Sampling Change Point Detection Algorithm using KCUSUM
by
Hubertus Van Dam
,
Yoo, Shinjae
,
Perry Siehien
in
Adaptive sampling
,
Algorithms
,
Change detection
2024
Detecting abrupt changes in real-time data streams from scientific simulations presents a challenging task, demanding the deployment of accurate and efficient algorithms. Identifying change points in live data stream involves continuous scrutiny of incoming observations for deviations in their statistical characteristics, particularly in high-volume data scenarios. Maintaining a balance between sudden change detection and minimizing false alarms is vital. Many existing algorithms for this purpose rely on known probability distributions, limiting their feasibility. In this study, we introduce the Kernel-based Cumulative Sum (KCUSUM) algorithm, a non-parametric extension of the traditional Cumulative Sum (CUSUM) method, which has gained prominence for its efficacy in online change point detection under less restrictive conditions. KCUSUM splits itself by comparing incoming samples directly with reference samples and computes a statistic grounded in the Maximum Mean Discrepancy (MMD) non-parametric framework. This approach extends KCUSUM's pertinence to scenarios where only reference samples are available, such as atomic trajectories of proteins in vacuum, facilitating the detection of deviations from the reference sample without prior knowledge of the data's underlying distribution. Furthermore, by harnessing MMD's inherent random-walk structure, we can theoretically analyze KCUSUM's performance across various use cases, including metrics like expected delay and mean runtime to false alarms. Finally, we discuss real-world use cases from scientific simulations such as NWChem CODAR and protein folding data, demonstrating KCUSUM's practical effectiveness in online change point detection.
Cost-Effective Methodology for Complex Tuning Searches in HPC: Navigating Interdependencies and Dimensionality
by
Choi, Min
,
Adrian Perez Dieguez
,
Mauro Del Ben
in
Complexity
,
Cost analysis
,
Decision analysis
2024
Tuning searches are pivotal in High-Performance Computing (HPC), addressing complex optimization challenges in computational applications. The complexity arises not only from finely tuning parameters within routines but also potential interdependencies among them, rendering traditional optimization methods inefficient. Instead of scrutinizing interdependencies among parameters and routines, practitioners often face the dilemma of conducting independent tuning searches for each routine, thereby overlooking interdependence, or pursuing a more resource-intensive joint search for all routines. This decision is driven by the consideration that some interdependence analysis and high-dimensional decomposition techniques in literature may be prohibitively expensive in HPC tuning searches. Our methodology adapts and refines these methods to ensure computational feasibility while maximizing performance gains in real-world scenarios. Our methodology leverages a cost-effective interdependence analysis to decide whether to merge several tuning searches into a joint search or conduct orthogonal searches. Tested on synthetic functions with varying levels of parameter interdependence, our methodology efficiently explores the search space. In comparison to Bayesian-optimization-based full independent or fully joint searches, our methodology suggested an optimized breakdown of independent and merged searches that led to final configurations up to 8% more accurate, reducing the search time by up to 95%. When applied to GPU-offloaded Real-Time Time-Dependent Density Functional Theory (RT-TDDFT), an application in computational materials science that challenges modern HPC autotuners, our methodology achieved an effective tuning search. Its adaptability and efficiency extend beyond RT-TDDFT, making it valuable for related applications in HPC.
Sparse-Stochastic Fragmented Exchange for Large-Scale Hybrid TDDFT Calculations
by
Sereda, Mykola
,
Neuhauser, Daniel
,
Tucker, Allen
in
Absorption spectra
,
Density functional theory
,
Exchanging
2024
We extend our recently developed sparse-stochastic fragmented exchange formalism for ground-state hybrid DFT (ngH-DFT) to calculate absorption spectra within linear-response time-dependent Generalized Kohn-Sham DFT (LR-GKS-TDDFT), for systems consisting of thousands of valence electrons within a grid-based/plane-wave representation. A mixed deterministic/fragmented-stochastic compression of the exchange kernel, here using long-range explicit exchange functionals, provides an efficient method for accurate optical spectra. Both real-time propagation as well frequency-resolved Casida-equation-type approaches for spectra are presented, and the method is applied to large molecular dyes.
Scalable Training of Trustworthy and Energy-Efficient Predictive Graph Foundation Models for Atomistic Materials Modeling: A Case Study with HydraGNN
by
Mehta, Kshitij
,
Rogers, David
,
Jorda Polo
in
Algorithms
,
Artificial neural networks
,
Ensemble learning
2024
We present our work on developing and training scalable, trustworthy, and energy-efficient predictive graph foundation models (GFMs) using HydraGNN, a multi-headed graph convolutional neural network architecture. HydraGNN expands the boundaries of graph neural network (GNN) computations in both training scale and data diversity. It abstracts over message passing algorithms, allowing both reproduction of and comparison across algorithmic innovations that define nearest-neighbor convolution in GNNs. This work discusses a series of optimizations that have allowed scaling up the GFMs training to tens of thousands of GPUs on datasets consisting of hundreds of millions of graphs. Our GFMs use multi-task learning (MTL) to simultaneously learn graph-level and node-level properties of atomistic structures, such as energy and atomic forces. Using over 154 million atomistic structures for training, we illustrate the performance of our approach along with the lessons learned on two state-of-the-art United States Department of Energy (US-DOE) supercomputers, namely the Perlmutter petascale system at the National Energy Research Scientific Computing Center and the Frontier exascale system at Oak Ridge Leadership Computing Facility. The HydraGNN architecture enables the GFM to achieve near-linear strong scaling performance using more than 2,000 GPUs on Perlmutter and 16,000 GPUs on Frontier.