Catalogue Search | MBRL

A study on prompt design, advantages and limitations of ChatGPT for deep learning program repair

by Cao, Jialun , Wen, Ming , Cheung, Shing-Chi in Artificial Intelligence , Chatbots , Computer Science

2025

The emergence of large language models (LLMs) such as ChatGPT has revolutionized many fields. In particular, recent advances in LLMs have triggered various studies examining the use of these models for software development tasks, such as program repair, code understanding, and code generation. Prior studies have shown the capability of ChatGPT in repairing conventional programs. However, debugging deep learning (DL) programs poses unique challenges since the decision logic is not directly encoded in the source code. This requires LLMs to not only parse the source code syntactically but also understand the intention of DL programs. Therefore, ChatGPT’s capability in repairing DL programs remains unknown. To fill this gap, our study aims to answer three research questions: (1) Can ChatGPT debug DL programs effectively? (2) How can ChatGPT’s repair performance be improved by prompting? (3) In which way can dialogue help facilitate the repair? Our study analyzes the typical information that is useful for prompt design and suggests enhanced prompt templates that are more efficient for repairing DL programs. On top of them, we summarize the dual perspectives (i.e., advantages and disadvantages) of ChatGPT’s ability, such as its handling of API misuse and recommendation, and its shortcomings in identifying default parameters. Our findings indicate that ChatGPT has the potential to repair DL programs effectively and that prompt engineering and dialogue can further improve its performance by providing more code intention. We also identified the key intentions that can enhance ChatGPT’s program repairing capability.

Journal Article

Share this book

Add to My Shelf

To what extent do DNN-based image classification models make unreliable inferences?

by Tian Yongqiang , Wen, Ming , Shing-Chi, Cheung in Accuracy , Artificial neural networks , Classification

2021

Deep Neural Network (DNN) models are widely used for image classification. While they offer high performance in terms of accuracy, researchers are concerned about if these models inappropriately make inferences using features irrelevant to the target object in a given image. To address this concern, we propose a metamorphic testing approach that assesses if a given inference is made based on irrelevant features. Specifically, we propose two metamorphic relations (MRs) to detect such unreliable inferences. These relations expect (a) the classification results with different labels or the same labels but less certainty from models after corrupting the relevant features of images, and (b) the classification results with the same labels after corrupting irrelevant features. The inferences that violate the metamorphic relations are regarded as unreliable inferences. Our evaluation demonstrated that our approach can effectively identify unreliable inferences for single-label classification models with an average precision of 64.1% and 96.4% for the two MRs, respectively. As for multi-label classification models, the corresponding precision for MR-1 and MR-2 is 78.2% and 86.5%, respectively. Further, we conducted an empirical study to understand the problem of unreliable inferences in practice. Specifically, we applied our approach to 18 pre-trained single-label image classification models and 3 multi-label classification models, and then examined their inferences on the ImageNet and COCO datasets. We found that unreliable inferences are pervasive. Specifically, for each model, more than thousands of correct classifications are actually made using irrelevant features. Next, we investigated the effect of such pervasive unreliable inferences, and found that they can cause significant degradation of a model’s overall accuracy. After including these unreliable inferences from the test set, the model’s accuracy can be significantly changed. Therefore, we recommend that developers should pay more attention to these unreliable inferences during the model evaluations. We also explored the correlation between model accuracy and the size of unreliable inferences. We found the inferences of the input with smaller objects are easier to be unreliable. Lastly, we found that the current model training methodologies can guide the models to learn object-relevant features to certain extent, but may not necessarily prevent the model from making unreliable inferences. We encourage the community to propose more effective training methodologies to address this issue.

Journal Article

Share this book

Add to My Shelf

DroidLeaks: a comprehensive database of resource leaks in Android apps

by Chang, Xu , Wang, Jue , Wu, Tianyong in Applications programs , Crashes , Data base management systems

2019

Resource leaks in Android apps are pervasive. They can cause serious performance degradation and system crashes. In recent years, many resource leak detection techniques have been proposed to help Android developers correctly manage system resources. Yet, there exist no common databases of real-world bugs for effectively comparing such techniques to understand their strengths and limitations. This paper describes our effort towards constructing such a bug database named DroidLeaks. To extract real resource leak bugs, we mined 124,215 code revisions of 34 popular open-source Android apps. After automated filtering and manual validation, we successfully found 292 fixed resource leak bugs, which cover a diverse set of resource classes, from 32 analyzed apps. To understand these bugs, we conducted an empirical study, which revealed the characteristics of resource leaks in Android apps and common patterns of resource management mistakes made by developers. To further demonstrate the usefulness of our work, we evaluated eight resource leak detectors from both academia and industry on DroidLeaks and performed a detailed analysis of their performance. We release DroidLeaks for public access to support future research.

Journal Article

Share this book

Add to My Shelf

Climate change and thermal comfort in Hong Kong

by Hart, Melissa Anne , Cheung, Chi Shing Calvin in Air Movements , Air temperature , Animal Physiology

2014

Thermal comfort is a major issue in cities and it is expected to change in the future due to the changing climate. The objective of this paper is to use the universal thermal comfort index (UTCI) to compare the outdoor thermal comfort in Hong Kong in the past (1971–2000) and the future (2046–2065 and 2081–2100). The future climate of Hong Kong was determined by the general circulation model (GCM) simulations of future climate scenarios (A1B and B1) established by the Intergovernmental Panel on Climate Change (IPCC). Three GCMs were chosen, GISS-ER, GFDL-CM2.1 and MRI-CGCM2.3.2, based on their performance in simulating past climate. Through a statistical downscaling procedure, the future climatic variables were transferred to the local scale. The UTCI is calculated by four predicted climate variables: air temperature, wind speed, relative humidity and solar radiation. After a normalisation procedure, future UTCI profiles for the urban area of Hong Kong were created. Comparing the past UTCI (calculated by observation data) and future UTCI, all three GCMs predicted that the future climate scenarios have a higher mode and a higher maximum value. There is a shift from ‘No Thermal Stress’ toward ‘Moderate Heat Stress’ and ‘Strong Heat Stress’ during the period 2046–2065, becoming more severe for the later period (2081–2100). Comparing the two scenarios, B1 exhibited similar projections in the two time periods whereas for A1B there was a significant difference, with both the mode and maximum increasing by 2 °C from 2046–2065 to 2081–2100.

Journal Article

Share this book

Add to My Shelf

How far are app secrets from being stolen? a case study on android

by Wei, Lili , Huang, Heqing , Li, Kevin in Applications programs , Case studies , Cloud computing

2025

Android apps can hold secret strings of themselves such as cloud service credentials or encryption keys. Leakage of such secret strings can induce unprecedented consequences like monetary losses or leakage of user private information. In practice, various security issues were reported because many apps failed to protect their secrets. However, litte is known about the types, usages, exploitability, and consequences of app secret leakage issues. While a large body of literature has been devoted to studying user private information leakage , there is no systematic study characterizing app secret leakage issues . How far are Android app secrets from being stolen? To bridge this gap, we conducted the first systematic study to characterize app secret leakage issues in Android apps based on 575 potential app secrets sampled from 14,665 popular Android apps on Google Play. We summarized the common categories of leaked app secrets, assessed their security impacts and disclosed app bad practices in storing app secrets. We devised a text mining strategy using regular expressions and demonstrated that numerous app secrets can be easily stolen, even from the highly popular Android apps on Google. In a follow-up study, we harvested 3,711 distinct exploitable app secrets through automatic analysis. Our findings highlight the prevalence of this problem and call for greater attention to app secret protection.

Journal Article

Share this book

Add to My Shelf

Toward actionable testing of deep learning models

by Tian, Yongqiang , Liu, Yepang , Cheung, Shing-Chi in Computer Science , Deep learning , Information Systems and Communication Service

2023

Journal Article

Share this book

Add to My Shelf

Static and Dynamic Optical Analysis of Micro Wrinkle Formation on a Liquid Surface

by Saxena, Antariksh , Tsakonas, Costas , Chappell, David in Contact angle , Decay rate , dielectric material

2021

A spatially periodic voltage was used to create a dielectrophoresis induced periodic micro wrinkle deformation on the surface of a liquid film. Optical Coherence Tomography provided the equilibrium wrinkle profile at submicron accuracy. The dynamic wrinkle amplitude was derived from optical diffraction analysis during sub-millisecond wrinkle formation and decay, after abruptly increasing or reducing the voltage, respectively. The decay time constant closely followed the film thickness dependence expected for surface tension driven viscous levelling. Modelling of the system using numerical solution of the Stokes flow equations with electrostatic forcing predicted that wrinkle formation was faster than decay, in accord with observations.

Journal Article

Share this book

Add to My Shelf

Performance Aware Service Pool in Dependable Service Oriented Architecture

by Liu, Xuan-Zhe , Mei, Hong , Huang, Gang in Algorithms , Architecture , Construction

2006

As a popular approach to dependable service oriented architecture (SOA), a service pool collects a set of services that provide the same functionality by different service providers for achieving desired reliability. However, if the tradeoff between reliability and other important qualities, e.g., performance, has to be considered, the construction and management of a service pool become much more complex. In this paper, an automated approach to this problem is presented. Based on the investigation of service pools in the typical triangle SOA model, two challenges critical to the effectiveness and efficiency of service pools are identified, including which services should be held by a pool and what order these services are invoked in. A set of algorithms are designed to address the two challenges and then a service pool can be automatically constructed and managed for given reliability and performance requirements in polynomial time. The approach is demonstrated on a J2EE based service platform and the comparison results between different pooling algorithms are evaluated.

Journal Article

Share this book

Add to My Shelf

MR-Coupler: Automated Metamorphic Test Generation via Functional Coupling Analysis

by Xu, Congying , Zhu, Hengcheng , Terragni, Valerio in Couplers , Coupling , Effectiveness

2026

Metamorphic testing (MT) is a widely recognized technique for alleviating the oracle problem in software testing. However, its adoption is hindered by the difficulty of constructing effective metamorphic relations (MRs), which often require domain-specific or hard-to-obtain knowledge. In this work, we propose a novel approach that leverages the functional coupling between methods, which is readily available in source code, to automatically construct MRs and generate metamorphic test cases (MTCs). Our technique, MR-Coupler, identifies functionally coupled method pairs, employs large language models to generate candidate MTCs, and validates them through test amplification and mutation analysis. In particular, we leverage three functional coupling features to avoid expensive enumeration of possible method pairs, and a novel validation mechanism to reduce false alarms. Our evaluation of MR-Coupler on 100 human-written MTCs and 50 real-world bugs shows that it generates valid MTCs for over 90% of tasks, improves valid MTC generation by 64.90%, and reduces false alarms by 36.56% compared to baselines. Furthermore, the MTCs generated by MR-Coupler detect 44% of the real bugs. Our results highlight the effectiveness of leveraging functional coupling for automated MR construction and the potential of MR-Coupler to facilitate the adoption of MT in practice. We also released the tool and experimental data to support future research.

Paper

Share this book

Add to My Shelf

MR-Scout: Automated Synthesis of Metamorphic Relations from Existing Test Cases

by Xu, Congying , Terragni, Valerio , Zhu, Hengcheng in Alliances , Codification , Extractors

2026

Metamorphic Testing (MT) alleviates the oracle problem by defining oracles based on metamorphic relations (MRs), that govern multiple related inputs and their outputs. However, designing MRs is challenging, as it requires domain-specific knowledge. This hinders the widespread adoption of MT. We observe that developer-written test cases can embed domain knowledge that encodes MRs. Such encoded MRs could be synthesized for testing not only their original programs but also other programs that share similar functionalities. In this paper, we propose MR-Scout to automatically synthesize MRs from test cases in open-source software (OSS) projects. MR-Scout first discovers MR-encoded test cases (MTCs), and then synthesizes the encoded MRs into parameterized methods (called codified MRs), and filters out MRs that demonstrate poor quality for new test case generation. MR-Scout discovered over 11,000 MTCs from 701 OSS projects. Experimental results show that over 97% of codified MRs are of high quality for automated test case generation, demonstrating the practical applicability of MR-Scout. Furthermore, codified-MRs-based tests effectively enhance the test adequacy of programs with developer-written tests, leading to 13.52% and 9.42% increases in line coverage and mutation score, respectively. Our qualitative study shows that 55.76% to 76.92% of codified MRs are easily comprehensible for developers.

Paper

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter