Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Discipline
      Discipline
      Clear All
      Discipline
  • Is Peer Reviewed
      Is Peer Reviewed
      Clear All
      Is Peer Reviewed
  • Item Type
      Item Type
      Clear All
      Item Type
  • Subject
      Subject
      Clear All
      Subject
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
      More Filters
      Clear All
      More Filters
      Source
    • Language
5,459 result(s) for "deep reinforcement learning"
Sort by:
A simulation-deep reinforcement learning (SiRL) approach for epidemic control optimization
In this paper, we address the controversies of epidemic control planning by developing a novel Simulation-Deep Reinforcement Learning (SiRL) model. COVID-19 reminded constituents over the world that government decision-making could change their lives. During the COVID-19 pandemic, governments were concerned with reducing fatalities as the virus spread but at the same time also maintaining a flowing economy. In this paper, we address epidemic decision-making regarding the interventions necessary given of the epidemic based on the purpose of the decision-maker. Further, we intend to compare different vaccination strategies, such as age-based and random vaccination, to shine a light on who should get priority in the vaccination process. To address these issues, we propose a simulation-deep reinforcement learning (DRL) framework. This framework is composed of an agent-based simulation model and a governor DRL agent that can enforce interventions in the agent-based simulation environment. Computational results show that our DRL agent can learn effective strategies and suggest optimal actions given a specific epidemic situation based on a multi-objective reward structure. We compare our DRL agent’s decisions to government interventions at different periods of time during the COVID-19 pandemic. Our results suggest that more could have been done to control the epidemic. In addition, if a random vaccination strategy that allows super-spreaders to get vaccinated early were used, infections would have been reduced by 32% at the expense of 4% more deaths. We also show that a behavioral change of fully quarantining 10% of the risky individuals and using a random vaccination strategy leads to a reduction of the death toll by 14% and 27% compared to the age-based vaccination strategy that was implemented and the New Jersey reported data, respectively. We have also demonstrated the flexibility of our approach to be applied to other locations by validating and applying our model to the COVID-19 case in the state of Kansas.
Multi-Objective Optimization of Energy Saving and Throughput in Heterogeneous Networks Using Deep Reinforcement Learning
Wireless networking using GHz or THz spectra has encouraged mobile service providers to deploy small cells to improve link quality and cell capacity using mmWave backhaul links. As green networking for less CO2 emission is mandatory to confront global climate change, we need energy efficient network management for such denser small-cell heterogeneous networks (HetNets) that already suffer from observable power consumption. We establish a dual-objective optimization model that minimizes energy consumption by switching off unused small cells while maximizing user throughput, which is a mixed integer linear problem (MILP). Recently, the deep reinforcement learning (DRL) algorithm has been applied to many NP-hard problems of the wireless networking field, such as radio resource allocation, association and power saving, which can induce a near-optimal solution with fast inference time as an online solution. In this paper, we investigate the feasibility of the DRL algorithm for a dual-objective problem, energy efficient routing and throughput maximization, which has not been explored before. We propose a proximal policy (PPO)-based multi-objective algorithm using the actor-critic model that is realized as an optimistic linear support framework in which the PPO algorithm searches for feasible solutions iteratively. Experimental results show that our algorithm can achieve throughput and energy savings comparable to the CPLEX.
Visual Pretraining via Contrastive Predictive Model for Pixel-Based Reinforcement Learning
In an attempt to overcome the limitations of reward-driven representation learning in vision-based reinforcement learning (RL), an unsupervised learning framework referred to as the visual pretraining via contrastive predictive model (VPCPM) is proposed to learn the representations detached from the policy learning. Our method enables the convolutional encoder to perceive the underlying dynamics through a pair of forward and inverse models under the supervision of the contrastive loss, thus resulting in better representations. In experiments with a diverse set of vision control tasks, by initializing the encoders with VPCPM, the performance of state-of-the-art vision-based RL algorithms is significantly boosted, with 44% and 10% improvement for RAD and DrQ at 100 steps, respectively. In comparison to the prior unsupervised methods, the performance of VPCPM matches or outperforms all the baselines. We further demonstrate that the learned representations successfully generalize to the new tasks that share a similar observation and action space.
Toward robust and scalable deep spiking reinforcement learning
Deep reinforcement learning (DRL) combines reinforcement learning algorithms with deep neural networks (DNNs). Spiking neural networks (SNNs) have been shown to be a biologically plausible and energy efficient alternative to DNNs. Since the introduction of surrogate gradient approaches that allowed to overcome the discontinuity in the spike function, SNNs can now be trained with the backpropagation through time (BPTT) algorithm. While largely explored on supervised learning problems, little work has been done on investigating the use of SNNs as function approximators in DRL. Here we show how SNNs can be applied to different DRL algorithms like Deep Q-Network (DQN) and Twin-Delayed Deep Deteministic Policy Gradient (TD3) for discrete and continuous action space environments, respectively. We found that SNNs are sensitive to the additional hyperparameters introduced by spiking neuron models like current and voltage decay factors, firing thresholds, and that extensive hyperparameter tuning is inevitable. However, we show that increasing the simulation time of SNNs, as well as applying a two-neuron encoding to the input observations helps reduce the sensitivity to the membrane parameters. Furthermore, we show that randomizing the membrane parameters, instead of selecting uniform values for all neurons, has stabilizing effects on the training. We conclude that SNNs can be utilized for learning complex continuous control problems with state-of-the-art DRL algorithms. While the training complexity increases, the resulting SNNs can be directly executed on neuromorphic processors and potentially benefit from their high energy efficiency.
A Usage Aware Dynamic Spectrum Access Scheme for Interweave Cognitive Radio Network by Exploiting Deep Reinforcement Learning
Future-generation wireless networks should accommodate surging growth in mobile data traffic and support an increasingly high density of wireless devices. Consequently, as the demand for spectrum continues to skyrocket, a severe shortage of spectrum resources for wireless networks will reach unprecedented levels of challenge in the near future. To deal with the emerging spectrum-shortage problem, dynamic spectrum access techniques have attracted a great deal of attention in both academia and industry. By exploiting the cognitive radio techniques, secondary users (SUs) are capable of accessing the underutilized spectrum holes of the primary users (PUs) to increase the whole system’s spectral efficiency with minimum interference violations. In this paper, we mathematically formulate the spectrum access problem for interweave cognitive radio networks, and propose a usage-aware deep reinforcement learning based scheme to solve it, which exploits the historical channel usage data to learn the time correlation and channel correlation of the PU channels. We evaluated the performance of the proposed approach by extensive simulations in both uncorrelated and correlated PU channel usage cases. The evaluation results validate the superiority of the proposed scheme in terms of channel access success probability and SU-PU interference probability, by comparing it with ideal results and existing methods.
Learning to trade in financial time series using high-frequency through wavelet transformation and deep reinforcement learning
Deep learning-based financial approaches have received attention from both investors and researchers. This study demonstrates how to optimize portfolios, asset allocation, and trading systems based on deep reinforcement learning using three frameworks. In the proposed deep learning structure, the input data are first decomposed through wavelet transformation (WT) to remove noise from stock price time-series data. Then, only the mother wavelet (high-frequency) data are used as input. Second, reinforcement learning is performed using the high-frequency data. The reinforcement learning network employs long short-term memory (LSTM). Actions are determined by the LSTM network or randomly. Third, it learns the optimal investment trading system using the actions of a given transaction and appropriate rewards. The structure of the optimal investment trading system obtained by the proposed deep reinforcement learning structure improves trading performance without requiring the construction of a predictive model. To investigate the performance of the proposed structure, we applied the S&P500, DJI, and KOSPI200 indices to the proposed structure (HW_LSTM_RL) and other reinforcement learning structures for comparison. We evaluated the difference in Sharpe ratio for various test periods (one to three years) and for different rewards. Using the decomposed high-frequency data as input, a portfolio of investment transactions was improved for highly volatile markets. In deep reinforcement learning, we found that network composition and appropriate rewards have significant influence on learning transactions in financial time-series data. Thus, the proposed HW_LSTM_RL structure demonstrates the importance of input data composition, learning network settings, and rewards.
Deep-Reinforcement-Learning-Based Object Transportation Using Task Space Decomposition
This paper presents a novel object transportation method using deep reinforcement learning (DRL) and the task space decomposition (TSD) method. Most previous studies on DRL-based object transportation worked well only in the specific environment where a robot learned how to transport an object. Another drawback was that DRL only converged in relatively small environments. This is because the existing DRL-based object transportation methods are highly dependent on learning conditions and training environments; they cannot be applied to large and complicated environments. Therefore, we propose a new DRL-based object transportation that decomposes a difficult task space to be transported into simple multiple sub-task spaces using the TSD method. First, a robot sufficiently learned how to transport an object in a standard learning environment (SLE) that has small and symmetric structures. Then, a whole-task space was decomposed into several sub-task spaces by considering the size of the SLE, and we created sub-goals for each sub-task space. Finally, the robot transported an object by sequentially occupying the sub-goals. The proposed method can be extended to a large and complicated new environment as well as the training environment without additional learning or re-learning. Simulations in different environments are presented to verify the proposed method, such as a long corridor, polygons, and a maze.
Power Allocation and Energy Cooperation for UAV-Enabled MmWave Networks: A Multi-Agent Deep Reinforcement Learning Approach
Unmanned Aerial Vehicle (UAV)-assisted cellular networks over the millimeter-wave (mmWave) frequency band can meet the requirements of a high data rate and flexible coverage in next-generation communication networks. However, higher propagation loss and the use of a large number of antennas in mmWave networks give rise to high energy consumption and UAVs are constrained by their low-capacity onboard battery. Energy harvesting (EH) is a viable solution to reduce the energy cost of UAV-enabled mmWave networks. However, the random nature of renewable energy makes it challenging to maintain robust connectivity in UAV-assisted terrestrial cellular networks. Energy cooperation allows UAVs to send their excessive energy to other UAVs with reduced energy. In this paper, we propose a power allocation algorithm based on energy harvesting and energy cooperation to maximize the throughput of a UAV-assisted mmWave cellular network. Since there is channel-state uncertainty and the amount of harvested energy can be treated as a stochastic process, we propose an optimal multi-agent deep reinforcement learning algorithm (DRL) named Multi-Agent Deep Deterministic Policy Gradient (MADDPG) to solve the renewable energy resource allocation problem for throughput maximization. The simulation results show that the proposed algorithm outperforms the Random Power (RP), Maximal Power (MP) and value-based Deep Q-Learning (DQL) algorithms in terms of network throughput.
Enhancing Building Energy Management: Adaptive Edge Computing for Optimized Efficiency and Inhabitant Comfort
Nowadays, in contemporary building and energy management systems (BEMSs), the predominant approach involves rule-based methodologies, typically employing supervised or unsupervised learning, to deliver energy-saving recommendations to building occupants. However, these BEMSs often suffer from a critical limitation—they are primarily trained on building energy data alone, disregarding crucial elements such as occupant comfort and preferences. This inherent lack of adaptability to occupants significantly hampers the effectiveness of energy-saving solutions. Moreover, the prevalent cloud-based nature of these systems introduces elevated cybersecurity risks and substantial data transmission overheads. In response to these challenges, this article introduces a cutting-edge edge computing architecture grounded in virtual organizations, federated learning, and deep reinforcement learning algorithms, tailored to optimize energy consumption within buildings/homes and facilitate demand response. By integrating energy efficiency measures within virtual organizations, which dynamically learn from real-time inhabitant data while prioritizing comfort, our approach effectively optimizes inhabitant consumption patterns, ushering in a new era of energy efficiency in the built environment.
Deep Reinforcement Learning with Interactive Feedback in a Human–Robot Environment
Robots are extending their presence in domestic environments every day, it being more common to see them carrying out tasks in home scenarios. In the future, robots are expected to increasingly perform more complex tasks and, therefore, be able to acquire experience from different sources as quickly as possible. A plausible approach to address this issue is interactive feedback, where a trainer advises a learner on which actions should be taken from specific states to speed up the learning process. Moreover, deep reinforcement learning has been recently widely used in robotics to learn the environment and acquire new skills autonomously. However, an open issue when using deep reinforcement learning is the excessive time needed to learn a task from raw input images. In this work, we propose a deep reinforcement learning approach with interactive feedback to learn a domestic task in a Human–Robot scenario. We compare three different learning methods using a simulated robotic arm for the task of organizing different objects; the proposed methods are (i) deep reinforcement learning (DeepRL); (ii) interactive deep reinforcement learning using a previously trained artificial agent as an advisor (agent–IDeepRL); and (iii) interactive deep reinforcement learning using a human advisor (human–IDeepRL). We demonstrate that interactive approaches provide advantages for the learning process. The obtained results show that a learner agent, using either agent–IDeepRL or human–IDeepRL, completes the given task earlier and has fewer mistakes compared to the autonomous DeepRL approach.