Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
31
result(s) for
"Nie, Pengyu"
Sort by:
WinHAP: An Efficient Haplotype Phasing Algorithm Based on Scalable Sliding Windows
2012
Haplotype phasing represents an essential step in studying the association of genomic polymorphisms with complex genetic diseases, and in determining targets for drug designing. In recent years, huge amounts of genotype data are produced from the rapidly evolving high-throughput sequencing technologies, and the data volume challenges the community with more efficient haplotype phasing algorithms, in the senses of both running time and overall accuracy. 2SNP is one of the fastest haplotype phasing algorithms with comparable low error rates with the other algorithms. The most time-consuming step of 2SNP is the construction of a maximum spanning tree (MST) among all the heterozygous SNP pairs. We simplified this step by replacing the MST with the initial haplotypes of adjacent heterozygous SNP pairs. The multi-SNP haplotypes were estimated within a sliding window along the chromosomes. The comparative studies on four different-scale genotype datasets suggest that our algorithm WinHAP outperforms 2SNP and most of the other haplotype phasing algorithms in terms of both running speeds and overall accuracies. To facilitate the WinHAP's application in more practical biological datasets, we released the software for free at: http://staff.ustc.edu.cn/~xuyun/winhap/index.htm.
Journal Article
Swarm Intelligence-Enhanced TimesNet for Adaptive Residual Energy Prediction in Power Communication Networks
2025
Predicting residual energy in communication nodes of private power networks is crucial for maintaining stable power grids, yet current methods often fail to capture the complex relationships between periodic features and sudden energy fluctuations in power time-series data. This article proposes a novel prediction model integrating TimesNet and a swarm intelligence-based adaptive attention reweighting (SI-AAR) mechanism. The model employs channel-independent slicing to encode heterogeneous data, extracts dynamic patterns through period matrix reconstruction and Inception convolution, and dynamically allocates attention weights using neighboring node information via the SI-AAR module to enhance spatial anomaly detection. Experimental results on real datasets demonstrate that the proposed method reduces Mean Squared Error (MSE) and Mean Absolute Error (MAE) by 8.18% and 11.19%, respectively, compared to TimesNet. Ablation studies highlight the SI-AAR module's significant contribution, achieving a 7.66% reduction in MSE.
Journal Article
Efficient Incremental Code Coverage Analysis for Regression Test Suites
by
Nie, Pengyu
,
Wang, Jiale Amber
,
Wang, Kaiyuan
in
Building codes
,
Code coverage analysis
,
Open source software
2024
Code coverage analysis has been widely adopted in the continuous integration of open-source and industry software repositories to monitor the adequacy of regression test suites. However, computing code coverage can be costly, introducing significant overhead during test execution. Plus, re-collecting code coverage for the entire test suite is usually unnecessary when only a part of the coverage data is affected by code changes. While regression test selection (RTS) techniques exist to select a subset of tests whose behaviors may be affected by code changes, they are not compatible with code coverage analysis techniques -- that is, simply executing RTS-selected tests leads to incorrect code coverage results. In this paper, we present the first incremental code coverage analysis technique, which speeds up code coverage analysis by executing a minimal subset of tests to update the coverage data affected by code changes. We implement our technique in a tool dubbed iJaCoCo, which builds on Ekstazi and JaCoCo -- the state-of-the-art RTS and code coverage analysis tools for Java. We evaluate iJaCoCo on 1,122 versions from 22 open-source repositories and show that iJaCoCo can speed up code coverage analysis time by an average of 1.86x and up to 8.20x compared to JaCoCo.
Multilingual Code Co-Evolution Using Large Language Models
2023
Many software projects implement APIs and algorithms in multiple programming languages. Maintaining such projects is tiresome, as developers have to ensure that any change (e.g., a bug fix or a new feature) is being propagated, timely and without errors, to implementations in other programming languages. In the world of ever-changing software, using rule-based translation tools (i.e., transpilers) or machine learning models for translating code from one language to another provides limited value. Translating each time the entire codebase from one language to another is not the way developers work. In this paper, we target a novel task: translating code changes from one programming language to another using large language models (LLMs). We design and implement the first LLM, dubbed Codeditor, to tackle this task. Codeditor explicitly models code changes as edit sequences and learns to correlate changes across programming languages. To evaluate Codeditor, we collect a corpus of 6,613 aligned code changes from 8 pairs of open-source software projects implementing similar functionalities in two programming languages (Java and C#). Results show that Codeditor outperforms the state-of-the-art approaches by a large margin on all commonly used automatic metrics. Our work also reveals that Codeditor is complementary to the existing generation-based models, and their combination ensures even greater performance.
InterTrans: Leveraging Transitive Intermediate Translations to Enhance LLM-based Code Translation
2024
Code translation aims to convert a program from one programming language (PL) to another. This long-standing software engineering task is crucial for modernizing legacy systems, ensuring cross-platform compatibility, enhancing performance, and more. However, automating this process remains challenging due to many syntactic and semantic differences between PLs. Recent studies show that even advanced techniques such as large language models (LLMs), especially open-source LLMs, still struggle with the task. Currently, code LLMs are trained with source code from multiple programming languages, thus presenting multilingual capabilities. In this paper, we investigate whether such multilingual capabilities can be harnessed to enhance code translation. To achieve this goal, we introduce InterTrans, an LLM-based automated code translation approach that, in contrast to existing approaches, leverages intermediate translations across PLs to bridge the syntactic and semantic gaps between source and target PLs. InterTrans contains two stages. It first utilizes a novel Tree of Code Translation (ToCT) algorithm to plan transitive intermediate translation sequences between a given source and target PL, then validates them in a specific order. We evaluate InterTrans with three open LLMs on three benchmarks (i.e., CodeNet, HumanEval-X, and TransCoder) involving six PLs. Results show an absolute improvement between 18.3% to 43.3% in Computation Accuracy (CA) for InterTrans over Direct Translation with 10 attempts. The best-performing variant of InterTrans (with Magicoder LLM) achieved an average CA of 87.3%-95.4% on three benchmarks.
Inline Tests
2022
Unit tests are widely used to check source code quality, but they can be too coarse-grained or ill-suited for testing individual program statements. We introduce inline tests to make it easier to check for faults in statements. We motivate inline tests through several language features and a common testing scenario in which inline tests could be beneficial. For example, inline tests can allow a developer to test a regular expression in place. We also define language-agnostic requirements for inline testing frameworks. Lastly, we implement I-Test, the first inline testing framework. I-Test works for Python and Java, and it satisfies most of the requirements. We evaluate I-Test on open-source projects by using it to test 144 statements in 31 Python programs and 37 Java programs. We also perform a user study. All nine user study participants say that inline tests are easy to write and that inline testing is beneficial. The cost of running inline tests is negligible, at 0.007x--0.014x, and our inline tests helped find two faults that have been fixed by the developers.
Generating Exceptional Behavior Tests with Reasoning Augmented Large Language Models
by
Gligoric, Milos
,
Liu, Yu
,
Li, Jessy Junyi
in
Large language models
,
Programming languages
,
Python
2024
Many popular programming languages, including C#, Java, and Python, support exceptions. Exceptions are thrown during program execution if an unwanted event happens, e.g., a method is invoked with an illegal argument value. Software developers write exceptional behavior tests (EBTs) to check that their code detects unwanted events and throws appropriate exceptions. Prior research studies have shown the importance of EBTs, but those studies also highlighted that developers put most of their efforts on \"happy paths\", e.g., paths without unwanted events. To help developers fill the gap, we present the first framework, dubbed EXLONG, that automatically generates EBTs. EXLONG is a large language model instruction-tuned from CodeLlama and embeds reasoning about traces that lead to throw statements, conditional expressions that guard throw statements, and non-exceptional behavior tests that execute similar traces. We compare EXLONG with the state-of-the-art models for test generation (CAT-LM) and one of the strongest foundation models (GPT3.5), as well as with analysis-based tools for test generation (Randoop and EvoSuite). Our results show that EXLONG outperforms existing models and tools. Furthermore, we contributed several pull requests to open-source projects and 23 EBTs generated by EXLONG were already accepted.
pytest-inline: An Inline Testing Tool for Python
2023
We present pytest-inline, the first inline testing framework for Python. We recently proposed inline tests to make it easier to test individual program statements. But, there is no framework-level support for developers to write inline tests in Python. To fill this gap, we design and implement pytest-inline as a plugin for pytest, the most popular Python testing framework. Using pytest-inline, a developer can write an inline test by assigning test inputs to variables in a target statement and specifying the expected test output. Then, pytest-inline runs each inline test and fails if the target statement's output does not match the expected output. In this paper, we describe our design of pytest-inline, the testing features that it provides, and the intended use cases. Our evaluation on inline tests that we wrote for 80 target statements from 31 open-source Python projects shows that using pytest-inline incurs negligible overhead, at 0.012x. pytest-inline is integrated into the pytest-dev organization, and a video demo is at https://www.youtube.com/watch?v=pZgiAxR_uJg.
Learning Deep Semantics for Test Completion
2023
Writing tests is a time-consuming yet essential task during software development. We propose to leverage recent advances in deep learning for text and code generation to assist developers in writing tests. We formalize the novel task of test completion to automatically complete the next statement in a test method based on the context of prior statements and the code under test. We develop TeCo -- a deep learning model using code semantics for test completion. The key insight underlying TeCo is that predicting the next statement in a test method requires reasoning about code execution, which is hard to do with only syntax-level data that existing code completion models use. TeCo extracts and uses six kinds of code semantics data, including the execution result of prior statements and the execution context of the test method. To provide a testbed for this new task, as well as to evaluate TeCo, we collect a corpus of 130,934 test methods from 1,270 open-source Java projects. Our results show that TeCo achieves an exact-match accuracy of 18, which is 29% higher than the best baseline using syntax-level data only. When measuring functional correctness of generated next statement, TeCo can generate runnable code in 29% of the cases compared to 18% obtained by the best baseline. Moreover, TeCo is significantly better than prior work on test oracle generation.
Roosterize: Suggesting Lemma Names for Coq Verification Projects Using Deep Learning
2021
Naming conventions are an important concern in large verification projects using proof assistants, such as Coq. In particular, lemma names are used by proof engineers to effectively understand and modify Coq code. However, providing accurate and informative lemma names is a complex task, which is currently often carried out manually. Even when lemma naming is automated using rule-based tools, generated names may fail to adhere to important conventions not specified explicitly. We demonstrate a toolchain, dubbed Roosterize, which automatically suggests lemma names in Coq projects. Roosterize leverages a neural network model trained on existing Coq code, thus avoiding manual specification of naming conventions. To allow proof engineers to conveniently access suggestions from Roosterize during Coq project development, we integrated the toolchain into the popular Visual Studio Code editor. Our evaluation shows that Roosterize substantially outperforms strong baselines for suggesting lemma names and is useful in practice. The demo video for Roosterize can be viewed at: https://youtu.be/HZ5ac7Q14rc.