Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Discipline
      Discipline
      Clear All
      Discipline
  • Is Peer Reviewed
      Is Peer Reviewed
      Clear All
      Is Peer Reviewed
  • Item Type
      Item Type
      Clear All
      Item Type
  • Subject
      Subject
      Clear All
      Subject
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
      More Filters
      Clear All
      More Filters
      Source
    • Language
5,168 result(s) for "neural network compression"
Sort by:
A generic deep learning architecture optimization method for edge device based on start-up latency reduction
In the promising Artificial Intelligence of Things technology, deep learning algorithms are implemented on edge devices to process data locally. However, high-performance deep learning algorithms are accompanied by increased computation and parameter storage costs, leading to difficulties in implementing huge deep learning algorithms on memory and power constrained edge devices, such as smartphones and drones. Thus various compression methods are proposed, such as channel pruning. According to the analysis of low-level operations on edge devices, existing channel pruning methods have limited effect on latency optimization. Due to data processing operations, the pruned residual blocks still result in significant latency, which hinders real-time processing of CNNs on edge devices. Hence, we propose a generic deep learning architecture optimization method to achieve further acceleration on edge devices. The network is optimized in two stages, Global Constraint and Start-up Latency Reduction, and pruning of both channels and residual blocks is achieved. Optimized networks are evaluated on desktop CPU, FPGA, ARM CPU, and PULP platforms. The experimental results show that the latency is reduced by up to 70.40%, which is 13.63% higher than only applying channel pruning and achieving real-time processing in the edge device.
A Review of Binarized Neural Networks
In this work, we review Binarized Neural Networks (BNNs). BNNs are deep neural networks that use binary values for activations and weights, instead of full precision values. With binary values, BNNs can execute computations using bitwise operations, which reduces execution time. Model sizes of BNNs are much smaller than their full precision counterparts. While the accuracy of a BNN model is generally less than full precision models, BNNs have been closing accuracy gap and are becoming more accurate on larger datasets like ImageNet. BNNs are also good candidates for deep learning implementations on FPGAs and ASICs due to their bitwise efficiency. We give a tutorial of the general BNN methodology and review various contributions, implementations and applications of BNNs.
A comprehensive review of model compression techniques in machine learning
This paper critically examines model compression techniques within the machine learning (ML) domain, emphasizing their role in enhancing model efficiency for deployment in resource-constrained environments, such as mobile devices, edge computing, and Internet of Things (IoT) systems. By systematically exploring compression techniques and lightweight design architectures, it is provided a comprehensive understanding of their operational contexts and effectiveness. The synthesis of these strategies reveals a dynamic interplay between model performance and computational demand, highlighting the balance required for optimal application. As machine learning (ML) models grow increasingly complex and data-intensive, the demand for computational resources and memory has surged accordingly. This escalation presents significant challenges for the deployment of artificial intelligence (AI) systems in real-world applications, particularly where hardware capabilities are limited. Therefore, model compression techniques are not merely advantageous but essential for ensuring that these models can be utilized across various domains, maintaining high performance without prohibitive resource requirements. Furthermore, this review underscores the importance of model compression in sustainable artificial intelligence (AI) development. The introduction of hybrid methods, which combine multiple compression techniques, promises to deliver superior performance and efficiency. Additionally, the development of intelligent frameworks capable of selecting the most appropriate compression strategy based on specific application needs is crucial for advancing the field. The practical examples and engineering applications discussed demonstrate the real-world impact of these techniques. By optimizing the balance between model complexity and computational efficiency, model compression ensures that the advancements in AI technology remain sustainable and widely applicable. This comprehensive review thus contributes to the academic discourse and guides innovative solutions for efficient and responsible machine learning practices, paving the way for future advancements in the field.
Differential Evolution Based Layer-Wise Weight Pruning for Compressing Deep Neural Networks
Deep neural networks have evolved significantly in the past decades and are now able to achieve better progression of sensor data. Nonetheless, most of the deep models verify the ruling maxim in deep learning—bigger is better—so they have very complex structures. As the models become more complex, the computational complexity and resource consumption of these deep models are increasing significantly, making them difficult to perform on resource-limited platforms, such as sensor platforms. In this paper, we observe that different layers often have different pruning requirements, and propose a differential evolutionary layer-wise weight pruning method. Firstly, the pruning sensitivity of each layer is analyzed, and then the network is compressed by iterating the weight pruning process. Unlike some other methods that deal with pruning ratio by greedy ways or statistical analysis, we establish an optimization model to find the optimal pruning sensitivity set for each layer. Differential evolution is an effective method based on population optimization which can be used to address this task. Furthermore, we adopt a strategy to recovery some of the removed connections to increase the capacity of the pruned model during the fine-tuning phase. The effectiveness of our method has been demonstrated in experimental studies. Our method compresses the number of weight parameters in LeNet-300-100, LeNet-5, AlexNet and VGG16 by 24×, 14×, 29× and 12×, respectively.
Pruning Deep Neural Networks for Green Energy-Efficient Models: A Survey
Over the past few years, larger and deeper neural network models, particularly convolutional neural networks (CNNs), have consistently advanced state-of-the-art performance across various disciplines. Yet, the computational demands of these models have escalated exponentially. Intensive computations hinder not only research inclusiveness and deployment on resource-constrained devices, such as Edge Internet of Things (IoT) devices, but also result in a substantial carbon footprint. Green deep learning has emerged as a research field that emphasizes energy consumption and carbon emissions during model training and inference, aiming to innovate with light and energy-efficient neural networks. Various techniques are available to achieve this goal. Studies show that conventional deep models often contain redundant parameters that do not alter outcomes significantly, underpinning the theoretical basis for model pruning. Consequently, this timely review paper seeks to systematically summarize recent breakthroughs in CNN pruning methods, offering necessary background knowledge for researchers in this interdisciplinary domain. Secondly, we spotlight the challenges of current model pruning methods to inform future avenues of research. Additionally, the survey highlights the pressing need for the development of innovative metrics to effectively balance diverse pruning objectives. Lastly, it investigates pruning techniques oriented towards sophisticated deep learning models, including hybrid feedforward CNNs and long short-term memory (LSTM) recurrent neural networks, a field ripe for exploration within green deep learning research.
APInf: Adaptive Policy Inference Based on Hierarchical Framework
Deep reinforcement learning has made adequate progress in addressing complex visual tasks. However, policy inference in relatively large neural networks is costly. Inspired by the fact that some states in a specific task can easily make decisions, we propose an APInf framework that can achieve low policy inference costs with quality guarantees. Two-stage training is proposed: First, to accelerate inference at easy states, sub-policy networks ordered by increasing capacity are generated; Second, to keep the quality of policy, master-policy is designed to choose a suitable sub-policy network subjected to state difficulty dynamically. We also improve the efficiency of master-policy inference by sharing the convolution layer between sub-policy networks and master-policy and then train master-policy under an extended MDP. Extensive experiments conducted in gym show that the adaptive inference framework reduces the FLOPs by 41.2% while maintaining similar quality. Furthermore, APInf outperforms the existing dynamic network structure in terms of policy quality and inference cost.
Smoothed per-tensor weight quantization: a robust solution for neural network deployment
This paper introduces a novel method to improve quantization outcomes for per-tensor weight quantization, focusing on enhancing computational efficiency and compatibility with resource-constrained hardware. Addressing the inherent challenges of depth-wise convolutions, the proposed smooth quantization technique redistributes weight magnitude disparities to pre-activation data, thereby equalizing channel-wise weight magnitudes. This adjustment enables more effective application of uniform quantization schemes. Experimental evaluations on the ImageNet classification benchmark demonstrate substantial performance gains across modern architectures and training strategies. The proposed method achieves improved accuracy to per-tensor quantization without noticeable computational overhead, making it a practical solution for edge-device deployments.
GradFreeBits: Gradient-Free Bit Allocation for Mixed-Precision Neural Networks
Quantized neural networks (QNNs) are among the main approaches for deploying deep neural networks on low-resource edge devices. Training QNNs using different levels of precision throughout the network (mixed-precision quantization) typically achieves superior trade-offs between performance and computational load. However, optimizing the different precision levels of QNNs can be complicated, as the values of the bit allocations are discrete and difficult to differentiate for. Moreover, adequately accounting for the dependencies between the bit allocation of different layers is not straightforward. To meet these challenges, in this work, we propose GradFreeBits: a novel joint optimization scheme for training mixed-precision QNNs, which alternates between gradient-based optimization for the weights and gradient-free optimization for the bit allocation. Our method achieves a better or on par performance with the current state-of-the-art low-precision classification networks on CIFAR10/100 and ImageNet, semantic segmentation networks on Cityscapes, and several graph neural networks benchmarks. Furthermore, our approach can be extended to a variety of other applications involving neural networks used in conjunction with parameters that are difficult to optimize for.
Power Efficient Machine Learning Models Deployment on Edge IoT Devices
Computing has undergone a significant transformation over the past two decades, shifting from a machine-based approach to a human-centric, virtually invisible service known as ubiquitous or pervasive computing. This change has been achieved by incorporating small embedded devices into a larger computational system, connected through networking and referred to as edge devices. When these devices are also connected to the Internet, they are generally named Internet-of-Thing (IoT) devices. Developing Machine Learning (ML) algorithms on these types of devices allows them to provide Artificial Intelligence (AI) inference functions such as computer vision, pattern recognition, etc. However, this capability is severely limited by the device’s resource scarcity. Embedded devices have limited computational and power resources available while they must maintain a high degree of autonomy. While there are several published studies that address the computational weakness of these small systems-mostly through optimization and compression of neural networks- they often neglect the power consumption and efficiency implications of these techniques. This study presents power efficiency experimental results from the application of well-known and proven optimization methods using a set of well-known ML models. The results are presented in a meaningful manner considering the “real world” functionality of devices and the provided results are compared with the basic “idle” power consumption of each of the selected systems. Two different systems with completely different architectures and capabilities were used providing us with results that led to interesting conclusions related to the power efficiency of each architecture.
Evaluation of Deep Neural Network Compression Methods for Edge Devices Using Weighted Score-Based Ranking Scheme
The demand for object detection capability in edge computing systems has surged. As such, the need for lightweight Convolutional Neural Network (CNN)-based object detection models has become a focal point. Current models are large in memory and deployment in edge devices is demanding. This shows that the models need to be optimized for the hardware without performance degradation. There exist several model compression methods; however, determining the most efficient method is of major concern. Our goal was to rank the performance of these methods using our application as a case study. We aimed to develop a real-time vehicle tracking system for cargo ships. To address this, we developed a weighted score-based ranking scheme that utilizes the model performance metrics. We demonstrated the effectiveness of this method by applying it on the baseline, compressed, and micro-CNN models trained on our dataset. The result showed that quantization is the most efficient compression method for the application, having the highest rank, with an average weighted score of 9.00, followed by binarization, having an average weighted score of 8.07. Our proposed method is extendable and can be used as a framework for the selection of suitable model compression methods for edge devices in different applications.