Catalogue Search | MBRL

Evaluating Domain Randomization Techniques in DRL Agents: A Comparative Study of Normal, Randomized, and Non-Randomized Resets

by Elsafi, Abubakar in Comparative studies , Control tasks , Deep learning

2025

Domain randomization is a widely adopted technique in deep reinforcement learning (DRL) to improve agent generalization by exposing policies to diverse environmental conditions. This paper investigates the impact of different reset strategies, normal, non-randomized, and randomized, on agent performance using the Deep Deterministic Policy Gradient (DDPG) and Twin Delayed DDPG (TD3) algorithms within the CarRacing-v2 environment. Two experimental setups were conducted: an extended training regime with DDPG for 1000 steps per episode across 1000 episodes, and a fast execution setup comparing DDPG and TD3 for 30 episodes with 50 steps per episode under constrained computational resources. A step-based reward scaling mechanism was applied under the randomized reset condition to promote broader state exploration. Experimental results show that randomized resets significantly enhance learning efficiency and generalization, with DDPG demonstrating superior performance across all reset strategies. In particular, DDPG combined with randomized resets achieves the highest smoothed rewards (reaching approximately 15), best stability, and fastest convergence. These differences are statistically significant, as confirmed by t-tests: DDPG outperforms TD3 under randomized (t = −101.91, p < 0.0001), normal (t = −21.59, p < 0.0001), and non-randomized (t = −62.46, p < 0.0001)) reset conditions. The findings underscore the critical role of reset strategy and reward shaping in enhancing the robustness and adaptability of DRL agents in continuous control tasks, particularly in environments where computational efficiency and training stability are crucial.

Journal Article

Share this book

Add to My Shelf

Optimizing UCS Prediction Models through XAI-Based Feature Selection in Soil Stabilization

by Badr, Atef , Mohammed, Ahmed Mohammed Awad , Hamdan, Mosab in Accuracy , Atterberg limits , Compressive strength

2026

Unconfined Compressive Strength (UCS) is a key parameter for the assessment of the stability and performance of stabilized soils, yet traditional laboratory testing is both time and resource intensive. In this study, an interpretable machine learning approach to UCS prediction is presented, pairing five models (Random Forest (RF), Gradient Boosting (GB), Extreme Gradient Boosting (XGB), CatBoost, and K-Nearest Neighbors (KNN)) with SHapley Additive exPlanations (SHAP) for enhanced interpretability and to guide feature removal. A complete dataset of 12 geotechnical and chemical parameters, i.e., Atterberg limits, compaction properties, stabilizer chemistry, dosage, curing time, was used to train and test the models. R2, RMSE, MSE, and MAE were used to assess performance. Initial results with all 12 features indicated that boosting-based models (GB, XGB, CatBoost) exhibited the highest predictive accuracy (R2 = 0.93) with satisfactory generalization on test data, followed by RF and KNN. SHAP analysis consistently picked CaO content, curing time, stabilizer dosage, and compaction parameters as the most important features, aligning with established soil stabilization mechanisms. Models were then re-trained on the top 8 and top 5 SHAP-ranked features. Interestingly, GB, XGB, and CatBoost maintained comparable accuracy with reduced input sets, while RF was moderately sensitive and KNN was somewhat better owing to reduced dimensionality. The findings confirm that feature reduction through SHAP enables cost-effective UCS prediction through the reduction of laboratory test requirements without significant accuracy loss. The suggested hybrid approach offers an explainable, interpretable, and cost-effective tool for geotechnical engineering practice.

Journal Article

Share this book

Add to My Shelf

Extending DDPG with Physics-Informed Constraints for Energy-Efficient Robotic Control

by Gabralla, Lubna A. , Mohammed Elhag, Arafat Abdulgader , Ahmed, Ali in Constraints , Energy consumption , Energy costs

2025

Energy efficiency stands as an essential factor when implementing deep reinforcement learning (DRL) policies for robotic control systems. Standard algorithms, including Deep Deterministic Policy Gradient (DDPG), primarily optimize task rewards but at the cost of excessively high energy consumption, making them impractical for real-world robotic systems. To address this limitation, we propose Physics-Informed DDPG (PI-DDPG), which integrates physics-based energy penalties to develop energy-efficient yet high-performing control policies. The proposed method introduces adaptive physics-informed constraints through a dynamic weighting factor (), enabling policies that balance reward maximization with energy savings. Our motivation is to overcome the impracticality of reward-only optimization by designing controllers that achieve competitive performance while substantially reducing energy consumption. PI-DDPG was evaluated in nine MuJoCo continuous control environments, where it demonstrated significant improvements in energy efficiency without compromising stability or performance. Experimental results confirm that PI-DDPG substantially reduces energy consumption compared to standard DDPG, while maintaining competitive task performance. For instance, energy costs decreased from 5542.98 to 3119.02 in HalfCheetah-v4 and from 1909.13 to 1586.75 in Ant-v4, with stable performance in Hopper-v4 (205.95 vs. 130.82) and InvertedPendulum-v4 (322.97 vs. 311.29). Although DDPG sometimes yields higher rewards, such as in HalfCheetah-v4 (5695.37 vs. 4894.59), it requires significantly greater energy expenditure. These results highlight PI-DDPG as a promising energy-conscious alternative for robotic control.

Journal Article

Share this book

Add to My Shelf

A systematic mapping study on solving university timetabling problems using meta-heuristic algorithms

by Bashab, Abeer , Ismail, Mohd Arfian , Elsafi, Abubakar in Adaptive algorithms , Algorithms , Artificial Intelligence

2020

Since university timetabling is commonly classified as a combinatorial optimisation problem, researchers tend to use optimisation approaches to reach the optimal timetable solution. Meta-heuristic algorithms have been presented as effective solutions as proven on their leverage over the last decade. Extensive literature studies have been published until today. However, a comprehensive systematic overview is missing. Therefore, this mapping study aimed to provide an organised view of the current state of the field and comprehensive awareness of the meta-heuristic approaches, by conducting meta-heuristic for solving university timetabling problems. In addition, the mapping study tried to highlight the intensity of publications over the last years, spotting the current trends and directions in the field of solving university timetabling problems, as well as having the work to provide guidance for future research by indicating the gaps and open questions to be fulfilled. Primary studies on mapping study that have been published in the last decade from 2009 until the first quarter of 2020, which consist of 131 publications, were selected as a benchmark for future research to solve university timetabling problems using meta-heuristic algorithms. The majority of the articles based on the publication type are hybrid methods (32%), in which the distribution of meta-heuristic algorithms the hybrid algorithms represent the higher application (31%). Likewise, the majority of the research is solution proposals (66%). The result of this study confirmed the efficiency and intensive application of the meta-heuristic algorithms in solving university timetabling problems, specifically the hybrid algorithms. A new trend of meta-heuristic algorithms such as grey wolf optimiser, cat swarm optimisation algorithm, Elitist self-adaptive step-size search and others with high expectations for reliable and satisfying results can be proposed to fill this gap.

Journal Article

Share this book

Add to My Shelf

Comparative analysis of maximum power point tracking methods for power optimization in grid tied photovoltaic solar systems

by Balfagih, Zain , Sabri, Saeed , Almohammedi, Akram A in Algorithms , Alternative energy sources , Comparative analysis

2025

The accelerating global shift toward renewable energy sources is largely attributed to increased investments and the rising demand for electricity, driven by technological progress, population growth, and escalating fuel prices associated with traditional power generation. Despite their benefits, conventional energy systems face challenges such as sensitivity to fluctuations in solar irradiance and temperature, which lead to non-linear electrical behavior and reduced efficiency. In Iraq, for example, the World Bank reports a significant power distribution loss of approximately 51%. To mitigate these inefficiencies, this study introduces a grid-connected photovoltaic (PV) system employing Maximum Power Point Tracking (MPPT) techniques—specifically, the Perturb and Observe (P&O) and Incremental Conductance (I&C) algorithms. These approaches aim to enhance energy extraction from PV arrays under dynamic environmental conditions. System modeling and performance evaluation were conducted using MATLAB/Simulink, focusing on optimizing output and regulating the DC–DC boost converter’s switching frequency. Under varying irradiance (1000–250 W/m2) and temperature (25–50 °C) conditions, I&C algorithm achieved a higher MPPT tracking efficiency of approximately 98.7%, compared to 95.2% for the P&O method. Additionally, I&C demonstrated faster convergence with a response time of 0.15 s and exhibited reduced power ripple (~ 1.2 kW) versus P&O (~ 3.8 kW), confirming its superior dynamic stability and steady-state performance.Article highlightsA smart solar system setup was tested to see which method extracts energy more efficiently from sunlight.One method (I&C) provided faster and more stable power performance, especially under changing weather conditions.The results can help guide future solar system designs, especially in areas with hot climates like Iraq.

Journal Article

Share this book

Add to My Shelf

Parkinson's Detection Using RNN-Graph-LSTM with Optimization Based on Speech Signals

by Ahmed Hamza, Manar , S. Almasoud, Ahmed , N. Al-Wesabi, Fahd in Accuracy , Algorithms , Coders

2022

Early detection of Parkinson's Disease (PD) using the PD patients’ voice changes would avoid the intervention before the identification of physical symptoms. Various machine learning algorithms were developed to detect PD detection. Nevertheless, these ML methods are lack in generalization and reduced classification performance due to subject overlap. To overcome these issues, this proposed work apply graph long short term memory (GLSTM) model to classify the dynamic features of the PD patient speech signal. The proposed classification model has been further improved by implementing the recurrent neural network (RNN) in batch normalization layer of GLSTM and optimized with adaptive moment estimation (ADAM) on network hidden layer. To consider the importance of feature engineering, this proposed system use Linear Discriminant analysis (LDA) for dimensionality reduction and Sparse Auto-Encoder (SAE) for extracting the dynamic speech features. Based on the computation of energy content transited from unvoiced to voice (onset) and voice to voiceless (offset), dynamic features are measured. The PD datasets is evaluated under 10 fold cross validation without sample overlap. The proposed smart PD detection method called RNN-GLSTM-ADAM is numerically experimented with persistent phonations in terms of accuracy, sensitivity, and specificity and Matthew correlation coefficient. The evaluated result of RNN-GLSTM-ADAM extremely improves the PD detection accuracy than static feature based conventional ML and DL approaches.

Journal Article

Share this book

Add to My Shelf

Dynamic Trust Modulation and Human Oversight in AI-Driven AML Systems: A Conceptual Framework for Compliance

by Diaz, Julian , Elsafi, Abubakar , Alsadoon, Abeer

2025

This literature review investigates how human trust, decision fatigue, explainability (XAI), and human oversight interrelate to influence analyst decision-making in AI-driven anti-money laundering (AML) systems. While prior research has predominantly emphasized algorithmic performance, detection accuracy, or regulatory compliance in isolation, a critical gap remains in understanding the human-centered dynamics that shape real-world operational outcomes. Addressing this gap, the review examines how financial institutions navigate compliance demands and operational constraints, drawing on the Australian regulatory environment as an illustrative governance reference, including expectations articulated by AUSTRAC. Building on this synthesis, the study identifies structural gaps in Trust Calibration and oversight practices. It introduces a Dynamic Trust Modulation (DTM) framework to conceptualize how trust evolves across AML workflows. The framework models trust as a fluid, context-dependent construct shaped by system behavior, analyst workload, explainability mechanisms, and regulatory pressure. By framing trust, explainability, and decision fatigue as interdependent components of human–AI collaboration, this review advances a more holistic perspective on socio-technical system design in financial crime detection. The proposed framework contributes theoretically by extending human–AI trust research into the AML domain and practically by offering actionable design principles to enhance system accountability, decision defensibility, and adaptive compliance in operational AML environments.

Journal Article

Share this book

Add to My Shelf

Quantifying Career Preferences and Perceptions of Software Testing Among Filipino IT Students: A Mixed-Method Analysis

by Carrido, Chrisza Joy M. , Elsafi, Abubakar , Alsadoon, Abeer

2025

Journal Article

Share this book

Add to My Shelf

Stakeholder management in value-based software development: systematic review

by Ghazali, Masitah , Imran Babar, Muhammad , Elsafi, Abubakar in Computer programs , economic leverage , Economics

2014

In the value-based software (VBS) development, an innovative idea is realised in order to gain an economic leverage. The VBS systems deal with financial streams and this thing make them different from the conventional systems. The success of a VBS system is associated with a valuable set of requirements. The valuable requirements can only be gathered from success critical stakeholders. To select a set of success critical stakeholders, different stakeholders identification and quantification (SIQ) approaches are presented by the researchers. The current approaches cannot be adopted as a standard as different methods and processes are adopted in different approaches. In this study, the aim is to find out the reported evidence based attributes or characteristics of the stakeholders and their usage context in terms of their application in different domains, stakeholders’ quantification metrics, the reported stakeholder types and the reported issues of VBS development. The standard systematic literature review guidelines are used as given by Barbara Kitchenham. The literature evidence shows that there is a need to explore all the possible stakeholders’ attributes. The stakeholders’ metrics can be derived by using the stakeholders’ attributes and a new SIQ framework can be proposed for the VBS systems.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter