Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
114
result(s) for
"Chen, Zizhong"
Sort by:
An Ensemble and Iterative Recovery Strategy Based kGNN Method to Edit Data with Label Noise
2022
Learning label noise is gaining increasing attention from a variety of disciplines, particularly in supervised machine learning for classification tasks. The k nearest neighbors (kNN) classifier is often used as a natural way to edit the training sets due to its sensitivity to label noise. However, the kNN-based editor may remove too many instances if not designed to take care of the label noise. In addition, the one-sided nearest neighbor (NN) rule is unconvincing, as it just considers the nearest neighbors from the perspective of the query sample. In this paper, we propose an ensemble and iterative recovery strategy-based kGNN method (EIRS-kGNN) to edit data with label noise. EIRS-kGNN first uses the general nearest neighbors (GNN) to expand the one-sided NN rule to a binary-sided NN rule, taking the neighborhood of the queried samples into account. Then, it ensembles the prediction results of a finite set of ks in the kGNN to prudently judge the noise levels for each sample. Finally, two loops, i.e., the inner loop and the outer loop, are leveraged to iteratively detect label noise. A frequency indicator is derived from the iterative processes to guide the mixture approaches, including relabeling and removing, to deal with the detected label noise. The goal of EIRS-kGNN is to recover the distribution of the data set as if it were not corrupted. Experimental results on both synthetic data sets and UCI benchmarks, including binary data sets and multi-class data sets, demonstrate the effectiveness of the proposed EIRS-kGNN method.
Journal Article
Condition Numbers of Gaussian Random Matrices
2005
Let $G_{m \\times n}$ be an $m \\times n$ real random matrix whose elements are independent and identically distributed standard normal random variables, and let $\\kappa_2(G_{m \\times n})$ be the 2-norm condition number of $G_{m \\times n}$. We prove that, for any $m \\geq 2$, $n \\geq 2$, and $x \\geq |n-m|+1$, $\\kappa_2(G_{m \\times n})$ satisfies ${\\scriptsize \\frac{1}{\\sqrt{2\\pi}}} ( { c }/{x} )^{|n-m|+1} < P{\\scriptsize(\\frac{\\kappa_2(G_{m \\times n})} {{n}/{(|n-m|+1)}} > x )} < {\\scriptsize \\frac{1}{\\sqrt{2\\pi}}} ( { C }/{x} )^{|n-m|+1},$ where $0.245 \\leq c \\leq 2.000$ and $5.013$ $\\leq C \\leq 6.414$ are universal positive constants independent of $m$, $n$, and $x$. Moreover, for any $m \\geq 2$ and $n \\geq 2$, $E(\\log\\kappa_2(G_{m \\times n})) < \\log{\\scriptsize \\frac{n}{|n-m|+1}} + 2.258.$ A similar pair of results for complex Gaussian random matrices is also established.
Journal Article
Self-supporting Co0.85Se nanosheets anchored on Co plate as highly efficient electrocatalyst for hydrogen evolution reaction in both acidic and alkaline media
by
Yang, Hongxiao
,
Chen, Zizhong
,
Zhou, Qiuxia
in
Atomic/Molecular Structure and Spectra
,
Biomedicine
,
Biotechnology
2020
Electrocatalytic water splitting via hydrogen evolution reaction (HER) represents one of promising strategies to gain hydrogen energy. In current work, self-supporting Co
0.85
Se nanosheets network anchored on Co plate (Co
0.85
Se NSs@Co) is fabricated by employing easily tailorable Co metal plate as the source conductive substrate. The scalable dealloying and hydrothermal selenization strategy was employed to build one layer of three dimensional interlinking Co
0.85
Se nanosheets network on the surface of Co plate. Benefiting from bulky integrated architecture and rich active sites, the as-made Co
0.85
Se NSs@Co exhibits superior electrocatalytic activity and long-term catalytic durability toward HER. It only requires lower overpotentials of 121 and 162 mV to drive the current density of 10 mA·cm
−2
for hydrogen evolution in 0.5 M H
2
SO
4
and 1 M KOH solution. Especially, no evident activity decay occurs upon 1,500 cycles or continuous test for 20 h at 10 mA·cm
−2
in both acidic and alkaline electrolytes. With the merits of exceptional performances, scalable production, and low cost, the self-supporting Co
0.85
Se NSs@Co holds prospective application potential as stable and binder-free electrocatalysts for hydrogen generation in a wide range of electrolyte.
Journal Article
Improving Energy Saving of One-sided Matrix Decompositions on CPU-GPU Heterogeneous Systems
2023
One-sided dense matrix decompositions (e.g., Cholesky, LU, and QR) are the key components in scientific computing in many different fields. Although their design has been highly optimized for modern processors, they still consume a considerable amount of energy. As CPU-GPU heterogeneous systems are commonly used for matrix decompositions, in this work, we aim to further improve the energy saving of one-sided matrix decompositions on CPU-GPU heterogeneous systems. We first build an Algorithm-Based Fault Tolerance protected overclocking technique (ABFT-OC) to enable us to exploit reliable overclocking for key matrix decomposition operations. Then, we design an energy-saving matrix decomposition framework, Bi-directional Slack Reclamation(BSR), that can intelligently combine the capability provided by ABFT-OC and DVFS to maximize energy saving and maintain performance and reliability. Experiments show that BSR is able to save up to 11.7% more energy compared with the current best energy saving optimization approach with no performance degradation and up to 14.1% Energy * Delay^2 reduction. Also, BSR enables the Pareto efficient performance-energy trade-off, which is able to provide up to 1.43x performance improvement without costing extra energy.
FT-GEMM: A Fault Tolerant High Performance GEMM Implementation on x86 CPUs
2023
General matrix/matrix multiplication (GEMM) is crucial for scientific computing and machine learning. However, the increased scale of the computing platforms raises concerns about hardware and software reliability. In this poster, we present FT-GEMM, a high-performance GEMM being capable of tolerating soft errors on-the-fly. We incorporate the fault tolerant functionality at algorithmic level by fusing the memory-intensive operations into the GEMM assembly kernels. We design a cache-friendly scheme for parallel FT-GEMM. Experimental results on Intel Cascade Lake demonstrate that FT-GEMM offers high reliability and performance -- faster than Intel MKL, OpenBLAS, and BLIS by 3.50\\%\\(\\sim\\) 22.14\\% for both serial and parallel GEMM, even under hundreds of errors injected per minute.
DGRO: Diameter-Guided Ring Optimization for Integrated Research Infrastructure Membership
by
Chen, Zizhong
,
Raghavan, Krishnan
,
Sheng, Di
in
Configurations
,
Diameters
,
Network topologies
2024
Logical ring is a core component in membership protocol. However, the logic ring fails to consider the underlying physical latency, resulting in a high diameter. To address this issue, we introduce Diameter-Guided Ring Optimization (DGRO), which focuses on constructing rings with the smallest possible diameter, selecting the most effective ring configurations, and implementing these configurations in parallel. We first explore an integration of deep Q-learning and graph embedding to optimize the ring topology. We next propose a ring selection strategy that assesses the current topology's average latency against a global benchmark, facilitating integration into modern peer-to-peer protocols and substantially reducing network diameter. To further enhance scalability, we propose a parallel strategy that distributes the topology construction process into separate partitions simultaneously. Our experiment shows that: 1) DGRO efficiently constructs a network topology that achieves up to a 60% reduction in diameter compared to the best results from an extensive search over \\(10^5\\) topologies, all within a significantly shorter computation time, 2) the ring selection of DGRO reduces the diameter of state-of-the-art methods Chord, RAPID, and Perigee by 10%-40%, 44%, and 60%. 3) the parallel construction can scale up to \\(32\\) partitions while maintaining the same diameter compared to the centralized version.
Anatomy of High-Performance GEMM with Online Fault Tolerance on GPUs
2023
General Matrix Multiplication (GEMM) is a crucial algorithm for various applications such as machine learning and scientific computing, and an efficient GEMM implementation is essential for the performance of these systems. While researchers often strive for faster performance by using large compute platforms, the increased scale of these systems can raise concerns about hardware and software reliability. In this paper, we present a design for a high-performance GEMM with algorithm-based fault tolerance for use on GPUs. We describe fault-tolerant designs for GEMM at the thread, warp, and threadblock levels, and also provide a baseline GEMM implementation that is competitive with or faster than the state-of-the-art, proprietary cuBLAS GEMM. We present a kernel fusion strategy to overlap and mitigate the memory latency due to fault tolerance with the original GEMM computation. To support a wide range of input matrix shapes and reduce development costs, we present a template-based approach for automatic code generation for both fault-tolerant and non-fault-tolerant GEMM implementations. We evaluate our work on NVIDIA Tesla T4 and A100 server GPUs. Experimental results demonstrate that our baseline GEMM presents comparable or superior performance compared to the closed-source cuBLAS. The fault-tolerant GEMM incurs only a minimal overhead (8.89\\% on average) compared to cuBLAS even with hundreds of errors injected per minute. For irregularly shaped inputs, the code generator-generated kernels show remarkable speedups of \\(160\\% \\sim 183.5\\%\\) and \\(148.55\\% \\sim 165.12\\%\\) for fault-tolerant and non-fault-tolerant GEMMs, outperforming cuBLAS by up to \\(41.40\\%\\).
Accelerating MPI Collectives with Process-in-Process-based Multi-object Techniques
2023
In the exascale computing era, optimizing MPI collective performance in high-performance computing (HPC) applications is critical. Current algorithms face performance degradation due to system call overhead, page faults, or data-copy latency, affecting HPC applications' efficiency and scalability. To address these issues, we propose PiP-MColl, a Process-in-Process-based Multi-object Inter-process MPI Collective design that maximizes small message MPI collective performance at scale. PiP-MColl features efficient multiple sender and receiver collective algorithms and leverages Process-in-Process shared memory techniques to eliminate unnecessary system call, page fault overhead, and extra data copy, improving intra- and inter-node message rate and throughput. Our design also boosts performance for larger messages, resulting in comprehensive improvement for various message sizes. Experimental results show that PiP-MColl outperforms popular MPI libraries, including OpenMPI, MVAPICH2, and Intel MPI, by up to 4.6X for MPI collectives like MPI_Scatter and MPI_Allgather.
FT-BLAS: A High Performance BLAS Implementation With Online Fault Tolerance
2021
Basic Linear Algebra Subprograms (BLAS) is a core library in scientific computing and machine learning. This paper presents FT-BLAS, a new implementation of BLAS routines that not only tolerates soft errors on the fly, but also provides comparable performance to modern state-of-the-art BLAS libraries on widely-used processors such as Intel Skylake and Cascade Lake. To accommodate the features of BLAS, which contains both memory-bound and computing-bound routines, we propose a hybrid strategy to incorporate fault tolerance into our brand-new BLAS implementation: duplicating computing instructions for memory-bound Level-1 and Level-2 BLAS routines and incorporating an Algorithm-Based Fault Tolerance mechanism for computing-bound Level-3 BLAS routines. Our high performance and low overhead are obtained from delicate assembly-level optimization and a kernel-fusion approach to the computing kernels. Experimental results demonstrate that FT-BLAS offers high reliability and high performance -- faster than Intel MKL, OpenBLAS, and BLIS by up to 3.50%, 22.14% and 21.70%, respectively, for routines spanning all three levels of BLAS we benchmarked, even under hundreds of errors injected per minute.
Self-adapting numerical software (SANS) effort
by
Dongarra, J.
,
Seymour, K.
,
You, H.
in
Algorithms
,
Application programming interface
,
Communication
2006
The challenge for the development of next-generation software is the successful management of the complex computational environment while delivering to the scientist the full power of flexible compositions of the available algorithmic alternatives. Selfadapting numerical software (SANS) systems are intended to meet this significant challenge. The process of arriving at an efficient numerical solution of problems in computational science involves numerous decisions by a numerical expert. Attempts to automate such decisions distinguish three levels: algorithmic decision, management of the parallel environment, and processor-specific tuning of kernels. Additionally, at any of these levels we can decide to rearrange the user's data. In this paper we look at a number of efforts at the University of Tennessee to investigate these areas. [PUBLICATION ABSTRACT]
Journal Article