Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
4
result(s) for
"Zhou, Pingqiang"
Sort by:
Tensor Train Random Projection
2023
This work proposes a Tensor Train Random Projection (TTRP) method for dimension reduction, where pairwise distances can be approximately preserved. Our TTRP is systematically constructed through a Tensor Train (TT) representation with TT-ranks equal to one. Based on the tensor train format, this random projection method can speed up the dimension reduction procedure for high-dimensional datasets and requires fewer storage costs with little loss in accuracy, compared with existing methods. We provide a theoretical analysis of the bias and the variance of TTRP, which shows that this approach is an expected isometric projection with bounded variance, and we show that the scaling Rademacher variable is an optimal choice for generating the corresponding TT-cores. Detailed numerical experiments with synthetic datasets and the MNIST dataset are conducted to demonstrate the efficiency of TTRP.
Journal Article
Interconnect Design Techniques for Multicore and 3D Integrated Circuits
2012
Over the past 40 years, the semiconductor industry has witnessed the exponential growth trend in system complexity as predicted by Moore's law, facilitated by continuously shrinking transistor and wire dimensions. Three dimensional (3D) circuit technology, with multiple tiers of active devices stacked above each other, is a key approach to achieve increasing levels of integration and performance in the future. Concomitant with exponentially reducing device dimensions, designers face new challenges in maximizing computation while remaining with a stringent power envelope. Over the last decade, multicore processors have emerged as a potential solution to address some of these problems by integrating multiple smaller and more energy efficient cores in order to replace a single, larger core. These cores must communicate through an efficient on-chip interconnection network, by ideas such as networks-on-chips (NoCs), and NoC design is vital to both performance and power. This thesis presents solutions to the challenges in on-chip interconnect, more specifically, the on-chip communication and power delivery network of 3D and multicore chips. The first part of this thesis focuses on developing techniques for designing efficient and high-performance NoC architecture for 3D and multicore chips. Depending on the nature of the application, the multicore system may be either a System-on-Chip (SoC), which executes a relatively well-characterized workload, or a Chip multiprocessor (CMP), which is a general purpose processor that should be capable of handling a variety of workloads. For SoCs, this thesis presents an efficient algorithm to synthesize application-specific NoC architectures in a 3D environment. We demonstrate that this method finds greatly improved solutions compared to a baseline algorithm reflecting prior work. We also study the impact of various factors on the network performance in 3D NoCs, including the through-silicon via (TSV) count and the number of 3D tiers. For CMPs, we observe that voltage and frequency scaling (VFS) for NoC can potentially reduce energy consumption, but the associated increase in latency and degradation in throughput limits its deployment. Therefore, we propose flexible-pipeline routers that reconfigure pipeline stages upon VFS, so that latency through such routers remains constant. With minimal hardware overhead, the deployment of such routers allows us to reduce network frequency and save network energy, without significant performance degradation. The second part of this thesis is concerned with the design and optimization of power delivery network for 3D and multicore chips. First, we propose a novel paradigm where we exploit a new type of capacitor, the metal-insulator-metal (MIM) capacitor, together with the traditional CMOS decaps, to optimize the power supply noise in 3D chips. Experimental results show that power grid noise can be more effectively optimized after the introduction of MIM decaps, with lower leakage power and little increase in the routing congestion, as compared to a solution using CMOS decaps only. Second, we explore the design and optimization of on-chip switched-capacitor (SC) DC-DC converters for multicore processors. On one hand, with an accurate power grid simulator, we find that distributed design of SC converters can reduce the IR drop significantly compared to the lumped design, with improved supply voltage. On the other hand, the efficiency of the power delivery system using SC converters is a major concern, but this has not been addressed at the system level in prior research. We develop models for the efficiency of such a system as a function of size and layout of the SC converters, and propose an approach to minimize power loss by optimizing the size and layout of the SC converters. The efficiency of these techniques is demonstrated on both homogenous and heterogenous multicore chips.
Dissertation
Tensor Train Random Projection
2021
This work proposes a novel tensor train random projection (TTRP) method for dimension reduction, where pairwise distances can be approximately preserved. Our TTRP is systematically constructed through a tensor train (TT) representation with TT-ranks equal to one. Based on the tensor train format, this new random projection method can speed up the dimension reduction procedure for high-dimensional datasets and requires less storage costs with little loss in accuracy, compared with existing methods. We provide a theoretical analysis of the bias and the variance of TTRP, which shows that this approach is an expected isometric projection with bounded variance, and we show that the Rademacher distribution is an optimal choice for generating the corresponding TT-cores. Detailed numerical experiments with synthetic datasets and the MNIST dataset are conducted to demonstrate the efficiency of TTRP.
ICARUS: A Specialized Architecture for Neural Radiance Fields Rendering
by
Chen, Anpei
,
Ma, Yu
,
Yu, Jingyi
in
Computer architecture
,
Energy efficiency
,
Graphics processing units
2022
The practical deployment of Neural Radiance Fields (NeRF) in rendering applications faces several challenges, with the most critical one being low rendering speed on even high-end graphic processing units (GPUs). In this paper, we present ICARUS, a specialized accelerator architecture tailored for NeRF rendering. Unlike GPUs using general purpose computing and memory architectures for NeRF, ICARUS executes the complete NeRF pipeline using dedicated plenoptic cores (PLCore) consisting of a positional encoding unit (PEU), a multi-layer perceptron (MLP) engine, and a volume rendering unit (VRU). A PLCore takes in positions \\& directions and renders the corresponding pixel colors without any intermediate data going off-chip for temporary storage and exchange, which can be time and power consuming. To implement the most expensive component of NeRF, i.e., the MLP, we transform the fully connected operations to approximated reconfigurable multiple constant multiplications (MCMs), where common subexpressions are shared across different multiplications to improve the computation efficiency. We build a prototype ICARUS using Synopsys HAPS-80 S104, a field programmable gate array (FPGA)-based prototyping system for large-scale integrated circuits and systems design. We evaluate the power-performance-area (PPA) of a PLCore using 40nm LP CMOS technology. Working at 400 MHz, a single PLCore occupies 16.5 \\(mm^2\\) and consumes 282.8 mW, translating to 0.105 uJ/sample. The results are compared with those of GPU and tensor processing unit (TPU) implementations.