Catalogue Search | MBRL

A survey of researchers’ code sharing and code reuse practices, and assessment of interactive notebook prototypes

by Cadwallader, Lauren , Hrynaszkiewicz, Iain in Airline code sharing , Bioinformatics , Biology

2022

This research aimed to understand the needs and habits of researchers in relation to code sharing and reuse; gather feedback on prototype code notebooks created by NeuroLibre; and help determine strategies that publishers could use to increase code sharing. We surveyed 188 researchers in computational biology. Respondents were asked about how often and why they look at code, which methods of accessing code they find useful and why, what aspects of code sharing are important to them, and how satisfied they are with their ability to complete these tasks. Respondents were asked to look at a prototype code notebook and give feedback on its features. Respondents were also asked how much time they spent preparing code and if they would be willing to increase this to use a code sharing tool, such as a notebook. As a reader of research articles the most common reason (70%) for looking at code was to gain a better understanding of the article. The most commonly encountered method for code sharing–linking articles to a code repository–was also the most useful method of accessing code from the reader’s perspective. As authors, the respondents were largely satisfied with their ability to carry out tasks related to code sharing. The most important of these tasks were ensuring that the code was running in the correct environment, and sharing code with good documentation. The average researcher, according to our results, is unwilling to incur additional costs (in time, effort or expenditure) that are currently needed to use code sharing tools alongside a publication. We infer this means we need different models for funding and producing interactive or executable research outputs if they are to reach a large number of researchers. For the purpose of increasing the amount of code shared by authors, PLOS Computational Biology is, as a result, focusing on policy rather than tools.

Journal Article

Share this book

Add to My Shelf

Securing Embedded System from Code Reuse Attacks: A Lightweight Scheme with Hardware Assistance

by An, Zhenliang , Wang, Weike , Zhang, Dexue in Algorithms , Circuits , Code reuse

2023

The growing prevalence of embedded systems in various applications has raised concerns about their vulnerability to malicious code reuse attacks. Current software-based and hardware-assisted security techniques struggle to detect or block these attacks with minor performance and implementation overhead. To address this issue, this paper presents a lightweight hardware-assisted scheme to enhance the security of embedded systems against code reuse attacks. We develop an on-chip lightweight hardware shadow stack to validate target addresses at runtime for backward-edge control flow integrity, which backs up valid return addresses during function calls and automatically verifies actual return addresses during the return phase. Additionally, we propose a lightweight stream cipher circuit that encrypts and decrypts critical stack data related to control flow manipulation, preventing attackers from analyzing or tampering with them. When designing and implementing the security mechanism for embedded systems, we fully consider the constraints of limited system resources and performance, optimizing both the architecture design and implementation of the proposed hardware. Finally, we integrate both the proposed lightweight hardware shadow stack and the runtime data encryption hardware into the OR1200 processor. We have verified the system security function on the Terasic DE1-SoC FPGA platform and evaluated the system performance as well as implementation overhead. The results show that the proposed lightweight hardware-assisted scheme can provide a dedicated defense capability against code reuse attacks for embedded systems, with an average system performance overhead of 0.39% and an area footprint of 0.316 mm2.

Journal Article

Share this book

Add to My Shelf

Quantifying cross-language code reuse via function-level clone detection

by Zhou, Yan , Rong, Yi in Ablation , Academic plagiarism detection , Artificial neural networks

2025

Code reuse through cloning is common in software development, yet excessive or unchecked cloning can harm maintainability and raise plagiarism concerns. Detecting the proportion of reused (cloned) code in a software project, especially across different programming languages, is a challenging task. This paper defines code reuse proportion detection as measuring how much code in a target program is cloned (identical or similar) from elsewhere. Existing code clone detection techniques perform well in single-language settings but struggle with cross-language clones and do not directly quantify reuse proportion. To address these gaps, we propose a novel cross-language function-level code clone detection approach using a dual embedding Siamese neural network. Our method represents code in Java and Python using a unified abstract syntax structure and semantic embeddings, then uses a Siamese deep network to learn language-agnostic similarities. We also introduce a metric to quantify the clone-based reuse ratio for each function or program. Experiments on three public datasets (including a Java clone benchmark, a Python code clone corpus, and a cross-language Java–Python clone dataset) show that our approach outperforms ten baseline methods, including state-of-the-art and classical clone detectors. Ablation studies confirm the contribution of each component (structural embeddings, cross-language alignment, and contrastive learning) to performance gains. Our model achieves new state-of-the-art accuracy in code clone detection, enabling precise measurement of code reuse. These results demonstrate that the proposed approach can effectively detect cross-language code clones and quantify reuse proportion, benefiting software plagiarism detection and code quality assessment in multi-language projects.

Journal Article

Share this book

Add to My Shelf

ACE-M: Automated Control Flow Integrity Enforcement Based on MPUs at the Function Level

by Lee, Sungbin , Cho, Jeonghun in Automatic control , Code reuse , Design modifications

2022

Control-flow integrity(CFI) ensures that the execution flow of a program follows the control-flow graph(CFG) determined at compile time. CFI is a security technique designed to prevent runtime attacks such as return-oriented programming (ROP). With the development of the Internet of Things (IoT), the number of embedded devices has increased, and security and protection techniques in embedded systems have become important. Since the hardware-based CFI technique requires separate hardware support, it is difficult to apply to an embedded device that is already arranged. In this paper, we propose a function-level CFI technique named ACE-M, which uses the memory protection unit (MPU) included in most embedded devices. MPU may provide attributes such as read-write-execute to the memory area. ACE-M has three steps: (1) initiate—inserts an MPU-related function into a specific position; (2) profiling—provides information for MPU configuration. After the initation step, several pieces of information can be determined; (3) set—modify the already-inserted function’s arguments. We propose a design that supports the MPU. In our model, the MPU becomes a control flow monitor that detects control flow errors(CFEs), and the inserted codes cause the MPU to act as a control flow checker. If the program deviates from the original control flow, the MPU raises an exception since its corresponding area will not be included in the executable area. This approach not only verifies the target address but also guarantees the running position. Our technique can detect any modification of the program counter (PC) to an arbitrary address.

Journal Article

Share this book

Add to My Shelf

ParadisEO: A Framework for the Reusable Design of Parallel and Distributed Metaheuristics

by Melab, N. , Cahon, S. , Talbi, E.-G. in Code reuse , Data mining , Distributed processing

2004

In this paper, we present the ParadisEO white-box object-oriented framework dedicated to the reusable design of parallel and distributed metaheuristics (PDM). ParadisEO provides a broad range of features including evolutionary algorithms (EA), local searches (LS), the most common parallel and distributed models and hybridization mechanisms, etc. This high content and utility encourages its use at European level. ParadisEO is based on a clear conceptual separation of the solution methods from the problems they are intended to solve. This separation confers to the user a maximum code and design reuse. Furthermore, the fine-grained nature of the classes provided by the framework allow a higher flexibility compared to other frameworks. ParadisEO is of the rare frameworks that provide the most common parallel and distributed models. Their implementation is portable on distributed-memory machines as well as on shared-memory multiprocessors, as it uses standard libraries such as MPI, PVM and PThreads. The models can be exploited in a transparent way, one has just to instantiate their associated provided classes. Their experimentation on the radio network design real-world application demonstrate their efficiency. [PUBLICATION ABSTRACT]

Journal Article

Share this book

Add to My Shelf

MetPy

by Marsh, Patrick T. , Leeman, John R. , Manser, Russell P. in Algorithms , Arrays , Atmospheric sciences

2022

MetPy is an open-source, Python-based package for meteorology, providing domain-specific functionality built extensively on top of the robust scientific Python software stack, which includes libraries like NumPy, SciPy, Matplotlib, and xarray. The goal of the project is to bring the weather analysis capabilities of GEMPAK (and similar software tools) into a modern computing paradigm. MetPy strives to employ best practices in its development, including software tests, continuous integration, and automated publishing of web-based documentation. As such, MetPy represents a sustainable, long-term project that fills a need for the meteorological community. MetPy’s development is substantially driven by its user community, both through feedback on a variety of open, public forums like Stack Overflow, and through code contributions facilitated by the GitHub collaborative software development platform. MetPy has recently seen the release of version 1.0, with robust functionality for analyzing and visualizing meteorological datasets. While previous versions of MetPy have already seen extensive use, the 1.0 release represents a significant milestone in terms of completeness and a commitment to long-term support for the programming interfaces. This article provides an overview of MetPy’s suite of capabilities, including its use of labeled arrays and physical unit information as its core data model, unit-aware calculations, cross sections, skew T and GEMPAK-like plotting, station model plots, and support for parsing a variety of meteorological data formats. The general road map for future planned development for MetPy is also discussed.

Journal Article

Share this book

Add to My Shelf

Improving IoT Cybersecurity Performance with Lifecycle-Motivated Bit-Manipulation Compiler Optimizations

by Budiul, Alexia , Pungilă, Ciprian in Algorithms , Architecture , Code reuse

2026

Implementing cryptographic primitives on resource-constrained IoT devices involves tight latency, code-size, and energy budgets. This work proposes a general LLVM backend instruction-selection strategy that recognizes single-bit update idioms—typically expressed as LOAD–-(AND/OR)–-STORE sequences in SHA-256 and similar bit-oriented code—and lowers them to the most efficient target-specific bit-manipulation primitive when legality and cost conditions are met. As a concrete instantiation, we implement the strategy for the Renesas RL78/G23 ISA by rewriting eligible patterns into SET1/CLR1 instructions when the constant mask targets exactly one bit. We evaluate the resulting backend on an RL78/G23 platform using cycle counts and code size (bytes) across SHA-256-driven workloads motivated by firmware integrity checking, Merkle-tree hashing, HMAC-based authentication, password-based key derivation (PBKDF2), and chunk-level update validation. The observed cycle reductions are also converted to absolute time across the device’s supported on-chip oscillator frequencies to quantify latency impact under different clocking modes. The experimental validation in this work is limited to the RL78/G23 backend implementation. The underlying instruction-selection idea may be adaptable to other RL78-family devices or to other embedded architectures that provide equivalent single-bit set/clear or bitfield operations; however, such adaptations require target-specific legality checks, cost modeling, and separate experimental validation.

Journal Article

Share this book

Add to My Shelf

Predicting software reuse using machine learning techniques—A case study on open-source Java software systems

by Lim, Mei Kuan , Yee Yen, Yuen , Yeow, Matthew Yit Hang in Algorithms , Artifact identification , Automation

2025

Software reuse is an essential practice to increase efficiency and reduce costs in software production. Software reuse practices range from reusing artifacts, libraries, components, packages, and APIs. Identifying suitable software for reuse requires pinpointing potential candidates. However, there are no objective methods in place to measure software reuse. This makes it challenging to identify highly reusable software. Software reuse research mainly addresses two hurdles: 1) identifying reusable candidates effectively and efficiently, and 2) selecting high-quality software components that improve maintainability and extensibility. This paper proposes automating software reuse prediction by leveraging machine learning (ML) algorithms, enabling future research and practitioners to better identify highly reusable software. Our approach uses cross-project code clone detection to establish the ground truth for software reuse, identifying code clones across popular GitHub projects as indicators of potential reuse candidates. Software metrics were extracted from Maven artifacts and used to train classification and regression models to predict and estimate software reuse. The average F1-score of the ML classification models is 77.19%. The best-performing model, Ridge Regression, achieved an F1-score of 79.17%. Additionally, this research aims to assist developers by identifying key metrics that significantly impact software reuse. Our findings suggest that the file-level PUA (Public Undocumented API) metric is the most important factor influencing software reuse. We also present suitable value ranges for the top five important metrics that developers can follow to create highly reusable software. Furthermore, we developed a tool that utilizes the trained models to predict the reuse potential of existing GitHub projects and rank Maven artifacts by their domain.

Journal Article

Share this book

Add to My Shelf

Why reinventing the wheels? An empirical study on library reuse and re-implementation

by Thung Ferdian , Lo, David , Foutse, Khomh in Downloading , Empirical analysis , Libraries

2020

Nowadays, with the rapid growth of open source software (OSS), library reuse becomes more and more popular since a large amount of third- party libraries are available to download and reuse. A deeper understanding on why developers reuse a library (i.e., replacing self-implemented code with an external library) or re-implement a library (i.e., replacing an imported external library with self-implemented code) could help researchers better understand the factors that developers are concerned with when reusing code. This understanding can then be used to improve existing libraries and API recommendation tools for researchers and practitioners by using the developers concerns identified in this study as design criteria. In this work, we investigated the reasons behind library reuse and re-implementation. To achieve this goal, we first crawled data from two popular sources, F-Droid and GitHub. Then, potential instances of library reuse and re-implementation were found automatically based on certain heuristics. Next, for each instance, we further manually identified whether it is valid or not. For library re-implementation, we obtained 82 instances which are distributed in 75 repositories. We then conducted two types of surveys (i.e., individual survey to corresponding developers of the validated instances and another open survey) for library reuse and re-implementation. For library reuse individual survey, we received 36 responses out of 139 contacted developers. For re-implementation individual survey, we received 13 responses out of 71 contacted developers. In addition, we received 56 responses from the open survey. Finally, we perform qualitative and quantitative analysis on the survey responses and commit logs of the validated instances. The results suggest that library reuse occurs mainly because developers were initially unaware of the library or the library had not been introduced. Re-implementation occurs mainly because the used library method is only a small part of the library, the library dependencies are too complicated, or the library method is deprecated. Finally, based on all findings obtained from analyzing the surveys and commit messages, we provided a few suggestions to improve the current library recommendation systems: tailored recommendation according to users’ preferences, detection of external code that is similar to a part of the users’ code (to avoid duplication or re-implementation), grouping similar recommendations for developers to compare and select the one they prefer, and disrecommendation of poor-quality libraries.

Journal Article

Share this book

Add to My Shelf

Automatically Attributing Mobile Threat Actors by Vectorized ATT&CK Matrix and Paired Indicator

by Lee, Kyungho , Kim, Kyoungmin , Shin, Youngsup in Automation , Code reuse , Communication

2021

During the past decade, mobile attacks have been established as an indispensable attack vector adopted by Advanced Persistent Threat (APT) groups. The ubiquitous nature of the smartphone has allowed users to use mobile payments and store private or sensitive data (i.e., login credentials). Consequently, various APT groups have focused on exploiting these vulnerabilities. Past studies have proposed automated classification and detection methods, while few studies have covered the cyber attribution. Our study introduces an automated system that focuses on cyber attribution. Adopting MITRE’s ATT&CK for mobile, we performed our study using the tactic, technique, and procedures (TTPs). By comparing the indicator of compromise (IoC), we were able to help reduce the false flags during our experiment. Moreover, we examined 12 threat actors and 120 malware using the automated method for detecting cyber attribution.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter