Catalogue Search | MBRL

A simulation-deep reinforcement learning (SiRL) approach for epidemic control optimization

in Agent-based models , COVID-19 vaccines , Decision making

2023

In this paper, we address the controversies of epidemic control planning by developing a novel Simulation-Deep Reinforcement Learning (SiRL) model. COVID-19 reminded constituents over the world that government decision-making could change their lives. During the COVID-19 pandemic, governments were concerned with reducing fatalities as the virus spread but at the same time also maintaining a flowing economy. In this paper, we address epidemic decision-making regarding the interventions necessary given of the epidemic based on the purpose of the decision-maker. Further, we intend to compare different vaccination strategies, such as age-based and random vaccination, to shine a light on who should get priority in the vaccination process. To address these issues, we propose a simulation-deep reinforcement learning (DRL) framework. This framework is composed of an agent-based simulation model and a governor DRL agent that can enforce interventions in the agent-based simulation environment. Computational results show that our DRL agent can learn effective strategies and suggest optimal actions given a specific epidemic situation based on a multi-objective reward structure. We compare our DRL agent’s decisions to government interventions at different periods of time during the COVID-19 pandemic. Our results suggest that more could have been done to control the epidemic. In addition, if a random vaccination strategy that allows super-spreaders to get vaccinated early were used, infections would have been reduced by 32% at the expense of 4% more deaths. We also show that a behavioral change of fully quarantining 10% of the risky individuals and using a random vaccination strategy leads to a reduction of the death toll by 14% and 27% compared to the age-based vaccination strategy that was implemented and the New Jersey reported data, respectively. We have also demonstrated the flexibility of our approach to be applied to other locations by validating and applying our model to the COVID-19 case in the state of Kansas.

Journal Article

Share this book

Add to My Shelf

Multi-Objective Optimization of Energy Saving and Throughput in Heterogeneous Networks Using Deep Reinforcement Learning

by Kim, Wooseong , Ryu, Kyungho in Algorithms , Connectivity , Deep learning

2021

Wireless networking using GHz or THz spectra has encouraged mobile service providers to deploy small cells to improve link quality and cell capacity using mmWave backhaul links. As green networking for less CO2 emission is mandatory to confront global climate change, we need energy efficient network management for such denser small-cell heterogeneous networks (HetNets) that already suffer from observable power consumption. We establish a dual-objective optimization model that minimizes energy consumption by switching off unused small cells while maximizing user throughput, which is a mixed integer linear problem (MILP). Recently, the deep reinforcement learning (DRL) algorithm has been applied to many NP-hard problems of the wireless networking field, such as radio resource allocation, association and power saving, which can induce a near-optimal solution with fast inference time as an online solution. In this paper, we investigate the feasibility of the DRL algorithm for a dual-objective problem, energy efficient routing and throughput maximization, which has not been explored before. We propose a proximal policy (PPO)-based multi-objective algorithm using the actor-critic model that is realized as an optimistic linear support framework in which the PPO algorithm searches for feasible solutions iteratively. Experimental results show that our algorithm can achieve throughput and energy savings comparable to the CPLEX.

Journal Article

Share this book

Add to My Shelf

Intelligent Traffic Control Strategies for VLC-Connected Vehicles and Pedestrian Flow Management

by Vieira, Manuel Augusto , Galvão, Gonçalo , Louro, Paula in Artificial intelligence , autonomous vehicles , Communication

2025

Urban traffic congestion leads to daily delays, driven by outdated, rigid control systems. As vehicle numbers grow, fixed-phase signals struggle to adapt to real-time conditions. This work presents a decentralized Multi-Agent Reinforcement Learning (MARL) system to manage a traffic cell composed of five intersections, introducing the novel Strategic Anti-Blocking Phase Adjustment (SAPA) module, developed to enable dynamic phase time adjustments. The goal is to optimize arterial traffic flow by adapting strategies to different traffic generation patterns, simulating priority movements along circular or radial arterials, such as inbound or outbound city flows. The system aims to manage diverse scenarios within a cell, with the long-term goal of scaling to city-wide networks. A Visible Light Communication (VLC) infrastructure is integrated to support real-time data exchange between vehicles and infrastructure, capturing vehicle position, speed, and pedestrian presence at intersections. The system is evaluated through multiple performance metrics, showing promising results: reduced vehicle queues and waiting times, increased average speeds, and improved pedestrian safety and overall flow management. These outcomes demonstrate the system’s potential to deliver adaptive, intelligent traffic control for complex urban environments.

Journal Article

Share this book

Add to My Shelf

Visual Pretraining via Contrastive Predictive Model for Pixel-Based Reinforcement Learning

by Luu, Tung M. , Nguyen, Thanh , Vu, Thang in Algorithms , Analysis , Computational linguistics

2022

In an attempt to overcome the limitations of reward-driven representation learning in vision-based reinforcement learning (RL), an unsupervised learning framework referred to as the visual pretraining via contrastive predictive model (VPCPM) is proposed to learn the representations detached from the policy learning. Our method enables the convolutional encoder to perceive the underlying dynamics through a pair of forward and inverse models under the supervision of the contrastive loss, thus resulting in better representations. In experiments with a diverse set of vision control tasks, by initializing the encoders with VPCPM, the performance of state-of-the-art vision-based RL algorithms is significantly boosted, with 44% and 10% improvement for RAD and DrQ at 100 steps, respectively. In comparison to the prior unsupervised methods, the performance of VPCPM matches or outperforms all the baselines. We further demonstrate that the learned representations successfully generalize to the new tasks that share a similar observation and action space.

Journal Article

Share this book

Add to My Shelf

Toward robust and scalable deep spiking reinforcement learning

by Walter, Florian , Ergene, Deniz , Knoll, Alois in Algorithms , Back propagation , continuous control

2023

Deep reinforcement learning (DRL) combines reinforcement learning algorithms with deep neural networks (DNNs). Spiking neural networks (SNNs) have been shown to be a biologically plausible and energy efficient alternative to DNNs. Since the introduction of surrogate gradient approaches that allowed to overcome the discontinuity in the spike function, SNNs can now be trained with the backpropagation through time (BPTT) algorithm. While largely explored on supervised learning problems, little work has been done on investigating the use of SNNs as function approximators in DRL. Here we show how SNNs can be applied to different DRL algorithms like Deep Q-Network (DQN) and Twin-Delayed Deep Deteministic Policy Gradient (TD3) for discrete and continuous action space environments, respectively. We found that SNNs are sensitive to the additional hyperparameters introduced by spiking neuron models like current and voltage decay factors, firing thresholds, and that extensive hyperparameter tuning is inevitable. However, we show that increasing the simulation time of SNNs, as well as applying a two-neuron encoding to the input observations helps reduce the sensitivity to the membrane parameters. Furthermore, we show that randomizing the membrane parameters, instead of selecting uniform values for all neurons, has stabilizing effects on the training. We conclude that SNNs can be utilized for learning complex continuous control problems with state-of-the-art DRL algorithms. While the training complexity increases, the resulting SNNs can be directly executed on neuromorphic processors and potentially benefit from their high energy efficiency.

Journal Article

Share this book

Add to My Shelf

A Usage Aware Dynamic Spectrum Access Scheme for Interweave Cognitive Radio Network by Exploiting Deep Reinforcement Learning

by Ji, Yusheng , Zhou, Hao , Teraki, Yuto in Access control , Access control (Computers) , channel usage aware

2022

Future-generation wireless networks should accommodate surging growth in mobile data traffic and support an increasingly high density of wireless devices. Consequently, as the demand for spectrum continues to skyrocket, a severe shortage of spectrum resources for wireless networks will reach unprecedented levels of challenge in the near future. To deal with the emerging spectrum-shortage problem, dynamic spectrum access techniques have attracted a great deal of attention in both academia and industry. By exploiting the cognitive radio techniques, secondary users (SUs) are capable of accessing the underutilized spectrum holes of the primary users (PUs) to increase the whole system’s spectral efficiency with minimum interference violations. In this paper, we mathematically formulate the spectrum access problem for interweave cognitive radio networks, and propose a usage-aware deep reinforcement learning based scheme to solve it, which exploits the historical channel usage data to learn the time correlation and channel correlation of the PU channels. We evaluated the performance of the proposed approach by extensive simulations in both uncorrelated and correlated PU channel usage cases. The evaluation results validate the superiority of the proposed scheme in terms of channel access success probability and SU-PU interference probability, by comparing it with ideal results and existing methods.

Journal Article

Share this book

Add to My Shelf

Learning to trade in financial time series using high-frequency through wavelet transformation and deep reinforcement learning

by Koh Hayeong , Lee, Jimin , Choe Hi Jun in Composition , Decomposition , Deep learning

2021

Deep learning-based financial approaches have received attention from both investors and researchers. This study demonstrates how to optimize portfolios, asset allocation, and trading systems based on deep reinforcement learning using three frameworks. In the proposed deep learning structure, the input data are first decomposed through wavelet transformation (WT) to remove noise from stock price time-series data. Then, only the mother wavelet (high-frequency) data are used as input. Second, reinforcement learning is performed using the high-frequency data. The reinforcement learning network employs long short-term memory (LSTM). Actions are determined by the LSTM network or randomly. Third, it learns the optimal investment trading system using the actions of a given transaction and appropriate rewards. The structure of the optimal investment trading system obtained by the proposed deep reinforcement learning structure improves trading performance without requiring the construction of a predictive model. To investigate the performance of the proposed structure, we applied the S&P500, DJI, and KOSPI200 indices to the proposed structure (HW_LSTM_RL) and other reinforcement learning structures for comparison. We evaluated the difference in Sharpe ratio for various test periods (one to three years) and for different rewards. Using the decomposed high-frequency data as input, a portfolio of investment transactions was improved for highly volatile markets. In deep reinforcement learning, we found that network composition and appropriate rewards have significant influence on learning transactions in financial time-series data. Thus, the proposed HW_LSTM_RL structure demonstrates the importance of input data composition, learning network settings, and rewards.

Journal Article

Share this book

Add to My Shelf

Enhancing Building Energy Management: Adaptive Edge Computing for Optimized Efficiency and Inhabitant Comfort

by Calvo-Gallego, Jaime , Houchati, Mahdi , Hernandez Fernandez, Javier in Algorithms , Analysis , Architecture and energy conservation

2023

Nowadays, in contemporary building and energy management systems (BEMSs), the predominant approach involves rule-based methodologies, typically employing supervised or unsupervised learning, to deliver energy-saving recommendations to building occupants. However, these BEMSs often suffer from a critical limitation—they are primarily trained on building energy data alone, disregarding crucial elements such as occupant comfort and preferences. This inherent lack of adaptability to occupants significantly hampers the effectiveness of energy-saving solutions. Moreover, the prevalent cloud-based nature of these systems introduces elevated cybersecurity risks and substantial data transmission overheads. In response to these challenges, this article introduces a cutting-edge edge computing architecture grounded in virtual organizations, federated learning, and deep reinforcement learning algorithms, tailored to optimize energy consumption within buildings/homes and facilitate demand response. By integrating energy efficiency measures within virtual organizations, which dynamically learn from real-time inhabitant data while prioritizing comfort, our approach effectively optimizes inhabitant consumption patterns, ushering in a new era of energy efficiency in the built environment.

Journal Article

Share this book

Add to My Shelf

Deep-Reinforcement-Learning-Based Object Transportation Using Task Space Decomposition

by Gyuho Eoh in Algorithms , Chemical technology , Curricula

2023

This paper presents a novel object transportation method using deep reinforcement learning (DRL) and the task space decomposition (TSD) method. Most previous studies on DRL-based object transportation worked well only in the specific environment where a robot learned how to transport an object. Another drawback was that DRL only converged in relatively small environments. This is because the existing DRL-based object transportation methods are highly dependent on learning conditions and training environments; they cannot be applied to large and complicated environments. Therefore, we propose a new DRL-based object transportation that decomposes a difficult task space to be transported into simple multiple sub-task spaces using the TSD method. First, a robot sufficiently learned how to transport an object in a standard learning environment (SLE) that has small and symmetric structures. Then, a whole-task space was decomposed into several sub-task spaces by considering the size of the SLE, and we created sub-goals for each sub-task space. Finally, the robot transported an object by sequentially occupying the sub-goals. The proposed method can be extended to a large and complicated new environment as well as the training environment without additional learning or re-learning. Simulations in different environments are presented to verify the proposed method, such as a long corridor, polygons, and a maze.

Journal Article

Share this book

Add to My Shelf

Power Allocation and Energy Cooperation for UAV-Enabled MmWave Networks: A Multi-Agent Deep Reinforcement Learning Approach

by Domingo, Mari Carmen in Algorithms , Alternative energy sources , Computer Simulation

2021

Unmanned Aerial Vehicle (UAV)-assisted cellular networks over the millimeter-wave (mmWave) frequency band can meet the requirements of a high data rate and flexible coverage in next-generation communication networks. However, higher propagation loss and the use of a large number of antennas in mmWave networks give rise to high energy consumption and UAVs are constrained by their low-capacity onboard battery. Energy harvesting (EH) is a viable solution to reduce the energy cost of UAV-enabled mmWave networks. However, the random nature of renewable energy makes it challenging to maintain robust connectivity in UAV-assisted terrestrial cellular networks. Energy cooperation allows UAVs to send their excessive energy to other UAVs with reduced energy. In this paper, we propose a power allocation algorithm based on energy harvesting and energy cooperation to maximize the throughput of a UAV-assisted mmWave cellular network. Since there is channel-state uncertainty and the amount of harvested energy can be treated as a stochastic process, we propose an optimal multi-agent deep reinforcement learning algorithm (DRL) named Multi-Agent Deep Deterministic Policy Gradient (MADDPG) to solve the renewable energy resource allocation problem for throughput maximization. The simulation results show that the proposed algorithm outperforms the Random Power (RP), Maximal Power (MP) and value-based Deep Q-Learning (DQL) algorithms in terms of network throughput.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter