Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Discipline
      Discipline
      Clear All
      Discipline
  • Is Peer Reviewed
      Is Peer Reviewed
      Clear All
      Is Peer Reviewed
  • Item Type
      Item Type
      Clear All
      Item Type
  • Subject
      Subject
      Clear All
      Subject
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
190 result(s) for "Zhao, Dongbin"
Sort by:
Building Energy Consumption Prediction: An Extreme Deep Learning Approach
Building energy consumption prediction plays an important role in improving the energy utilization rate through helping building managers to make better decisions. However, as a result of randomness and noisy disturbance, it is not an easy task to realize accurate prediction of the building energy consumption. In order to obtain better building energy consumption prediction accuracy, an extreme deep learning approach is presented in this paper. The proposed approach combines stacked autoencoders (SAEs) with the extreme learning machine (ELM) to take advantage of their respective characteristics. In this proposed approach, the SAE is used to extract the building energy consumption features, while the ELM is utilized as a predictor to obtain accurate prediction results. To determine the input variables of the extreme deep learning model, the partial autocorrelation analysis method is adopted. Additionally, in order to examine the performances of the proposed approach, it is compared with some popular machine learning methods, such as the backward propagation neural network (BPNN), support vector regression (SVR), the generalized radial basis function neural network (GRBFNN) and multiple linear regression (MLR). Experimental results demonstrate that the proposed method has the best prediction performance in different cases of the building energy consumption.
Comprehensive comparison of online ADP algorithms for continuous-time optimal control
Online learning is an important property of adaptive dynamic programming (ADP). Online observations contain plentiful dynamics information, and ADP algorithms can utilize them to learn the optimal control policy. This paper reviews the research of online ADP algorithms for the optimal control of continuous-time systems. With the intensive study, ADP has been developed towards model free and data efficient. After separately introducing the algorithms, we compare their performance on the same problem. This paper is desired to provide a comprehensive understanding of continuous-time online ADP algorithms.
Deep Reinforcement Learning for Perception and Control of Autonomous Vehicles
Deep reinforcement learning is an effective combination of the decision-making ability of reinforcement learning and the perceptual ability of deep learning, which has achieved a series of milestones in the field of artificial intelligence in recent years. This method has also obtained many beneficial attempts in the field of intelligent driving, which has strongly promoted the development and progress of key technologies of autonomous vehicles. This report briefly reviews the development of deep reinforcement learning methods and specifically introduces the research progress of my team in perception and decision making of autonomous vehicles based on deep reinforcement learning methods.
Preoperative versus postoperative chemo-radiotherapy for locally advanced gastric cancer: a multicenter propensity score-matched analysis
Background Peri-operative chemo-radiotherapyplayed important rolein locally advanced gastric cancer. Whether preoperative strategy can improve the long-term prognosis compared with postoperative treatment is unclear. The study purpose to compare oncologic outcomes in locally advanced gastric cancer patients treated with preoperative chemo-radiotherapy (pre-CRT) and postoperative chemo-radiotherapy (post-CRT). Methods From January 2009 to April 2019, 222 patients from 2 centers with stage T3/4 and/or N positive gastric cancer who received pre-CRT and post-CRT were included. After propensity score matching (PSM), comparisons of local regional control (LC), distant metastasis-free survival (DMFS), disease-free survival (DFS) and overall survival (OS) were performed using Kaplan-Meier analysis and log-rank test between pre- and post-CRT groups. Results The median follow-up period was 30 months. 120 matched cases were generated for analysis. Three-year LC, DMFS, DFS and OS for pre- vs. post-CRT groups were 93.8% vs. 97.2% ( p  = 0.244), 78.7% vs. 65.7% ( p  = 0.017), 74.9% vs. 65.3% ( p  = 0.042) and 74.4% vs. 61.2% ( p  = 0.055), respectively. Pre-CRT were significantly associated with DFS in uni- and multi-variate analysis. Conclusion Preoperative CRT showed advantages of oncologic outcome compared with postoperative CRT. Trial registration ClinicalTrial.gov NCT01291407 , NCT03427684 and NCT04062058 , date of registration: Feb 8, 2011.
Vision-based control in the open racing car simulator with deep and reinforcement learning
With decades of development, computer intelligence has now reached a really high level. Especially deep learning (DL) and reinforcement learning (RL) endow computers the perception and decision abilities. This paper aims to design a vision-based system that is able to play The Open Racing Car Simulator (TORCS) like a human player that uses images. With the DL-trained perception module, useful and low-dimensional information is extracted from first-person images. Based on that, the RL-trained module further manipulates the simulated car in the middle of the lane. The two modules are separately trained, and both DL and RL advantages are maximally utilized. Experiments on different tracks show the promising performance of the method.
Applications of functionalized ionic liquids
Recent developments of the synthesis and applications of functionalized ionic liquids (including dual-functionalized ionic liquids) have been highlighted in this review. Ionic liquids are attracting attention as alternative solvents in green chemistry, but as more functionalized ILs are prepared, a greater number of applications in increasingly diverse fields are found.
Learning future representation with synthetic observations for sample-efficient reinforcement learning
Image-based reinforcement learning (RL) has proven its effectiveness for continuous visual control of embodied agents, where upstream representation learning largely determines the effectiveness of policy learning. Employing self-supervised auxiliary tasks allows the agent to enhance visual representation in a targeted manner, thereby improving the policy performance and the RL sample efficiency. Prior advanced self-supervised RL methods all try to design better auxiliary objectives to extract more information from agent experience, while ignoring the training data constraints caused by experience limitations in RL training. In this article, we first try to break through this auxiliary training data constraint, proposing a novel RL auxiliary task named learning future representation with synthetic observations (LFS), which improves the self-supervised RL by enriching auxiliary training data. Firstly, a novel training-free method, named frame mask, is proposed to synthesize novel observations that may contain future information. Next, the latent nearest-neighbor clip (LNC) is correspondingly proposed to alleviate the impact of unqualified noise in synthetic observations. The remaining synthetic observations and real observations then together serve as the auxiliary training data to achieve a clustering-based temporal association task for advanced representation learning. LFS allows the agent to access and learn observations that are not present in the current experience but will appear in future training, thus enabling comprehensive visual understanding and an efficient RL process. In addition, LFS does not rely on rewards or actions, which means it has a wider scope of application (e.g., learning from video) than recent advanced RL auxiliary tasks. We conduct extensive experiments on challenging continuous visual control of complex embodied agents, including robot locomotion and manipulation. The results demonstrate that our LFS exhibits state-of-the-art sample efficiency on end-to-end RL tasks (leading on 12/13 tasks), and enables advanced RL visual pre-training (outperforming the next best method by 1.51×) on action-free video demonstrations.
A supervised Actor–Critic approach for adaptive cruise control
A novel supervised Actor–Critic (SAC) approach for adaptive cruise control (ACC) problem is proposed in this paper. The key elements required by the SAC algorithm namely Actor and Critic, are approximated by feed-forward neural networks respectively. The output of Actor and the state are input to Critic to approximate the performance index function. A Lyapunov stability analysis approach has been presented to prove the uniformly ultimate bounded property of the estimation errors of the neural networks. Moreover, we use the supervisory controller to pre-train Actor to achieve a basic control policy, which can improve the training convergence and success rate. We apply this method to learn an approximate optimal control policy for the ACC problem. Experimental results in several driving scenarios demonstrate that the SAC algorithm performs well, so it is feasible and effective for the ACC problem.
Missile guidance with assisted deep reinforcement learning for head-on interception of maneuvering target
In missile guidance, pursuit performance is seriously degraded due to the uncertainty and randomness in target maneuverability, detection delay, and environmental noise. In many methods, accurately estimating the acceleration of the target or the time-to-go is needed to intercept the maneuvering target, which is hard in an environment with uncertainty. In this paper, we propose an assisted deep reinforcement learning (ARL) algorithm to optimize the neural network-based missile guidance controller for head-on interception. Based on the relative velocity, distance, and angle, ARL can control the missile to intercept the maneuvering target and achieve large terminal intercept angle. To reduce the influence of environmental uncertainty, ARL predicts the target’s acceleration as an auxiliary supervised task. The supervised learning task improves the ability of the agent to extract information from observations. To exploit the agent’s good trajectories, ARL presents the Gaussian self-imitation learning to make the mean of action distribution approach the agent’s good actions. Compared with vanilla self-imitation learning, Gaussian self-imitation learning improves the exploration in continuous control. Simulation results validate that ARL outperforms traditional methods and proximal policy optimization algorithm with higher hit rate and larger terminal intercept angle in the simulation environment with noise, delay, and maneuverable target.
Adaptive cruise control via adaptive dynamic programming with experience replay
The adaptive cruise control (ACC) problem can be transformed to an optimal tracking control problem for complex nonlinear systems. In this paper, a novel highly efficient model-free adaptive dynamic programming (ADP) approach with experience replay technology is proposed to design the ACC controller. Experience replay increases the data efficiency by recording the available driving data and repeatedly presenting them to the learning procedure of the acceleration controller in the ACC system. The learning framework that combines ADP with experience replay is described in detail. The distinguishing feature of the algorithm is that when estimating parameters of the critic network and the actor network with gradient rules, the gradients of historical data and current data are used to update parameters concurrently. It is proved with Lyapunov theory that the weight estimation errors of the actor network and the critic network are uniformly ultimately bounded under the novel weight update rules. The learning performance of the ACC controller implemented by this ADP algorithm is clearly demonstrated that experience replay can increase data efficiency significantly, and the approximate optimality and adaptability of the learned control policy are tested with typical driving scenarios.