Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
5,191
result(s) for
"neural networks compression"
Sort by:
A generic deep learning architecture optimization method for edge device based on start-up latency reduction
by
Meng, Lin
,
Li, Qi
,
Li, Hengyi
in
Algorithms
,
Artificial intelligence
,
Central processing units
2024
In the promising Artificial Intelligence of Things technology, deep learning algorithms are implemented on edge devices to process data locally. However, high-performance deep learning algorithms are accompanied by increased computation and parameter storage costs, leading to difficulties in implementing huge deep learning algorithms on memory and power constrained edge devices, such as smartphones and drones. Thus various compression methods are proposed, such as channel pruning. According to the analysis of low-level operations on edge devices, existing channel pruning methods have limited effect on latency optimization. Due to data processing operations, the pruned residual blocks still result in significant latency, which hinders real-time processing of CNNs on edge devices. Hence, we propose a generic deep learning architecture optimization method to achieve further acceleration on edge devices. The network is optimized in two stages, Global Constraint and Start-up Latency Reduction, and pruning of both channels and residual blocks is achieved. Optimized networks are evaluated on desktop CPU, FPGA, ARM CPU, and PULP platforms. The experimental results show that the latency is reduced by up to 70.40%, which is 13.63% higher than only applying channel pruning and achieving real-time processing in the edge device.
Journal Article
A Review of Binarized Neural Networks
2019
In this work, we review Binarized Neural Networks (BNNs). BNNs are deep neural networks that use binary values for activations and weights, instead of full precision values. With binary values, BNNs can execute computations using bitwise operations, which reduces execution time. Model sizes of BNNs are much smaller than their full precision counterparts. While the accuracy of a BNN model is generally less than full precision models, BNNs have been closing accuracy gap and are becoming more accurate on larger datasets like ImageNet. BNNs are also good candidates for deep learning implementations on FPGAs and ASICs due to their bitwise efficiency. We give a tutorial of the general BNN methodology and review various contributions, implementations and applications of BNNs.
Journal Article
Power Efficient Machine Learning Models Deployment on Edge IoT Devices
by
Keramidas, George
,
Fanariotis, Anastasios
,
Orphanoudakis, Theofanis
in
Accuracy
,
Algorithms
,
Analysis
2023
Computing has undergone a significant transformation over the past two decades, shifting from a machine-based approach to a human-centric, virtually invisible service known as ubiquitous or pervasive computing. This change has been achieved by incorporating small embedded devices into a larger computational system, connected through networking and referred to as edge devices. When these devices are also connected to the Internet, they are generally named Internet-of-Thing (IoT) devices. Developing Machine Learning (ML) algorithms on these types of devices allows them to provide Artificial Intelligence (AI) inference functions such as computer vision, pattern recognition, etc. However, this capability is severely limited by the device’s resource scarcity. Embedded devices have limited computational and power resources available while they must maintain a high degree of autonomy. While there are several published studies that address the computational weakness of these small systems-mostly through optimization and compression of neural networks- they often neglect the power consumption and efficiency implications of these techniques. This study presents power efficiency experimental results from the application of well-known and proven optimization methods using a set of well-known ML models. The results are presented in a meaningful manner considering the “real world” functionality of devices and the provided results are compared with the basic “idle” power consumption of each of the selected systems. Two different systems with completely different architectures and capabilities were used providing us with results that led to interesting conclusions related to the power efficiency of each architecture.
Journal Article
A comprehensive review of model compression techniques in machine learning
by
Sabino da Silva, Waldir
,
Cordeiro, Lucas Carvalho
,
Dantas, Pierre Vilar
in
Artificial intelligence
,
Complexity
,
Computational efficiency
2024
This paper critically examines model compression techniques within the machine learning (ML) domain, emphasizing their role in enhancing model efficiency for deployment in resource-constrained environments, such as mobile devices, edge computing, and Internet of Things (IoT) systems. By systematically exploring compression techniques and lightweight design architectures, it is provided a comprehensive understanding of their operational contexts and effectiveness. The synthesis of these strategies reveals a dynamic interplay between model performance and computational demand, highlighting the balance required for optimal application. As machine learning (ML) models grow increasingly complex and data-intensive, the demand for computational resources and memory has surged accordingly. This escalation presents significant challenges for the deployment of artificial intelligence (AI) systems in real-world applications, particularly where hardware capabilities are limited. Therefore, model compression techniques are not merely advantageous but essential for ensuring that these models can be utilized across various domains, maintaining high performance without prohibitive resource requirements. Furthermore, this review underscores the importance of model compression in sustainable artificial intelligence (AI) development. The introduction of hybrid methods, which combine multiple compression techniques, promises to deliver superior performance and efficiency. Additionally, the development of intelligent frameworks capable of selecting the most appropriate compression strategy based on specific application needs is crucial for advancing the field. The practical examples and engineering applications discussed demonstrate the real-world impact of these techniques. By optimizing the balance between model complexity and computational efficiency, model compression ensures that the advancements in AI technology remain sustainable and widely applicable. This comprehensive review thus contributes to the academic discourse and guides innovative solutions for efficient and responsible machine learning practices, paving the way for future advancements in the field.
Journal Article
Interpretable Task-inspired Adaptive Filter Pruning for Neural Networks Under Multiple Constraints
2024
Existing methods for filter pruning mostly rely on specific data-driven paradigms but lack the interpretability. Besides, these approaches usually assign layer-wise compression ratios automatically only under given FLOPs by neural architecture search algorithms or just manually, which are short of efficiency. In this paper, we propose a novel interpretable task-inspired adaptive filter pruning method for neural networks to solve the above problems. First, we treat filters as semantic detectors and develop the task-inspired importance criteria by evaluating correlations between input tasks and feature maps, and observing the information flow through filters between adjacent layers. Second, we refer to the human neurobiological mechanism for the better interpretability, where the retained first layer filters act as individual information receivers. Third, inspired by the phenomenon that each filter has a deterministic impact on FLOPs and network parameters, we provide an efficient adaptive compression ratio allocation strategy based on differentiable pruning approximation under multiple budget constraints, as well as considering the performance objective. The proposed method is validated with extensive experiments on the state-of-the-art neural networks, which significantly outperforms all the existing filter pruning methods and achieves the best trade-off between neural network compression and task performance. With ResNet-50 on ImageNet, our approach reduces 75.49% parameters and 70.90% FLOPs, only suffering from 2.31% performance degradation.
Journal Article
Differential Evolution Based Layer-Wise Weight Pruning for Compressing Deep Neural Networks
by
Li, Na
,
Wu, Tao
,
Li, Xiaoyang
in
differential evolution
,
neural network compression
,
sparse network
2021
Deep neural networks have evolved significantly in the past decades and are now able to achieve better progression of sensor data. Nonetheless, most of the deep models verify the ruling maxim in deep learning—bigger is better—so they have very complex structures. As the models become more complex, the computational complexity and resource consumption of these deep models are increasing significantly, making them difficult to perform on resource-limited platforms, such as sensor platforms. In this paper, we observe that different layers often have different pruning requirements, and propose a differential evolutionary layer-wise weight pruning method. Firstly, the pruning sensitivity of each layer is analyzed, and then the network is compressed by iterating the weight pruning process. Unlike some other methods that deal with pruning ratio by greedy ways or statistical analysis, we establish an optimization model to find the optimal pruning sensitivity set for each layer. Differential evolution is an effective method based on population optimization which can be used to address this task. Furthermore, we adopt a strategy to recovery some of the removed connections to increase the capacity of the pruned model during the fine-tuning phase. The effectiveness of our method has been demonstrated in experimental studies. Our method compresses the number of weight parameters in LeNet-300-100, LeNet-5, AlexNet and VGG16 by 24×, 14×, 29× and 12×, respectively.
Journal Article
Pruning Deep Neural Networks for Green Energy-Efficient Models: A Survey
by
Hussain, Amir
,
Ayed, Mounir Ben
,
Fourati, Rahma
in
Approximation
,
Artificial Intelligence
,
Artificial neural networks
2024
Over the past few years, larger and deeper neural network models, particularly convolutional neural networks (CNNs), have consistently advanced state-of-the-art performance across various disciplines. Yet, the computational demands of these models have escalated exponentially. Intensive computations hinder not only research inclusiveness and deployment on resource-constrained devices, such as Edge Internet of Things (IoT) devices, but also result in a substantial carbon footprint. Green deep learning has emerged as a research field that emphasizes energy consumption and carbon emissions during model training and inference, aiming to innovate with light and energy-efficient neural networks. Various techniques are available to achieve this goal. Studies show that conventional deep models often contain redundant parameters that do not alter outcomes significantly, underpinning the theoretical basis for model pruning. Consequently, this timely review paper seeks to systematically summarize recent breakthroughs in CNN pruning methods, offering necessary background knowledge for researchers in this interdisciplinary domain. Secondly, we spotlight the challenges of current model pruning methods to inform future avenues of research. Additionally, the survey highlights the pressing need for the development of innovative metrics to effectively balance diverse pruning objectives. Lastly, it investigates pruning techniques oriented towards sophisticated deep learning models, including hybrid feedforward CNNs and long short-term memory (LSTM) recurrent neural networks, a field ripe for exploration within green deep learning research.
Journal Article
No Fine-Tuning, No Cry: Robust SVD for Compressing Deep Networks
2021
A common technique for compressing a neural network is to compute the k-rank ℓ2 approximation Ak of the matrix A∈Rn×d via SVD that corresponds to a fully connected layer (or embedding layer). Here, d is the number of input neurons in the layer, n is the number in the next one, and Ak is stored in O((n+d)k) memory instead of O(nd). Then, a fine-tuning step is used to improve this initial compression. However, end users may not have the required computation resources, time, or budget to run this fine-tuning stage. Furthermore, the original training set may not be available. In this paper, we provide an algorithm for compressing neural networks using a similar initial compression time (to common techniques) but without the fine-tuning step. The main idea is replacing the k-rank ℓ2 approximation with ℓp, for p∈[1,2], which is known to be less sensitive to outliers but much harder to compute. Our main technical result is a practical and provable approximation algorithm to compute it for any p≥1, based on modern techniques in computational geometry. Extensive experimental results on the GLUE benchmark for compressing the networks BERT, DistilBERT, XLNet, and RoBERTa confirm this theoretical advantage.
Journal Article
APInf: Adaptive Policy Inference Based on Hierarchical Framework
by
Zhang, Hongjie
,
Wang, Shuo
,
Li, Jing
in
adaptive policy inference
,
Deep reinforcement learning
,
dynamic neural network
2022
Deep reinforcement learning has made adequate progress in addressing complex visual tasks. However, policy inference in relatively large neural networks is costly. Inspired by the fact that some states in a specific task can easily make decisions, we propose an APInf framework that can achieve low policy inference costs with quality guarantees. Two-stage training is proposed: First, to accelerate inference at easy states, sub-policy networks ordered by increasing capacity are generated; Second, to keep the quality of policy, master-policy is designed to choose a suitable sub-policy network subjected to state difficulty dynamically. We also improve the efficiency of master-policy inference by sharing the convolution layer between sub-policy networks and master-policy and then train master-policy under an extended MDP. Extensive experiments conducted in gym show that the adaptive inference framework reduces the FLOPs by 41.2% while maintaining similar quality. Furthermore, APInf outperforms the existing dynamic network structure in terms of policy quality and inference cost.
Journal Article
Smoothed per-tensor weight quantization: a robust solution for neural network deployment
2025
This paper introduces a novel method to improve quantization outcomes for per-tensor weight quantization, focusing on enhancing computational efficiency and compatibility with resource-constrained hardware. Addressing the inherent challenges of depth-wise convolutions, the proposed smooth quantization technique redistributes weight magnitude disparities to pre-activation data, thereby equalizing channel-wise weight magnitudes. This adjustment enables more effective application of uniform quantization schemes. Experimental evaluations on the ImageNet classification benchmark demonstrate substantial performance gains across modern architectures and training strategies. The proposed method achieves improved accuracy to per-tensor quantization without noticeable computational overhead, making it a practical solution for edge-device deployments.
Journal Article