Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Discipline
      Discipline
      Clear All
      Discipline
  • Is Peer Reviewed
      Is Peer Reviewed
      Clear All
      Is Peer Reviewed
  • Item Type
      Item Type
      Clear All
      Item Type
  • Subject
      Subject
      Clear All
      Subject
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
84 result(s) for "处理器"
Sort by:
The Sunway TaihuLight supercomputer: system and applications
The Sunway TaihuLight supercomputer is the world's first system with a peak performance greater than 100 PFlops. In this paper, we provide a detailed introduction to the TaihuLight system. In contrast with other existing heterogeneous supercomputers, which include both CPU processors and PCIe-connected many-core accelerators (NVIDIA GPU or Intel Xeon Phi), the computing power of TaihuLight is provided by a homegrown many-core SW26010 CPU that includes both the management processing elements (MPEs) and computing processing elements (CPEs) in one chip. With 260 processing elements in one CPU, a single SW26010 provides a peak performance of over three TFlops. To alleviate the memory bandwidth bottleneck in most applications, each CPE comes with a scratch pad memory, which serves as a user-controlled cache. To support the parallelization of programs on the new many-core architecture, in addition to the basic C/C++ and Fortran compilers, the system provides a customized Sunway OpenACC tool that supports the OpenACC 2.0 syntax. This paper also reports our preliminary efforts on developing and optimizing applications on the TaihuLight system, focusing on key application domains, such as earth system modeling, ocean surface wave modeling, atomistic simulation, and phase-field simulation.
Experimental realization of single-shot nonadiabatic holonomic gates in nuclear spins
Nonadiabatic holonomic quantum computation has received increasing attention due to its robustness against control errors. However, all the previous schemes have to use at least two sequentially implemented gates to realize a general one-qubit gate. Based on two recent reports, we construct two Hamiltonians and experimentally realized nonadiabatic holonomic gates by a single-shot implementation in a two-qubit nuclear magnetic resonance (NMR) system. Two noncommuting one-qubit holonomic gates, rotating along .~ and ~ axes respectively, are implemented by evolving a work qubit and an ancillary qubit nonadiabatically following a quantum circuit designed. Using a sequence compiler developed for NMR quantum information processor, we optimize the whole pulse sequence, minimizing the total error of the implementation. Finally, all the nonadiabatic holonomic gates reach high unattenuated experimental fidelities over 98%.
Research on functional verification method processor model built by Chisel
With the increasing complexity of hardware design, verification has become the difficulty of chip design. In order to effectively shorten the overall working time of the design process, it is necessary to work out a method to quickly find design errors in the verification that takes up a lot of time in the design. The design under test is an ARM Chisel compatible with the ARM V4 instruction set architecture (ISA) processor model. The processor model is built with a new hardware language Chisel and is a highly complex hardware design. Based on this embedded processor model, ①a random instruction generator supporting all instructions of the ARM V4 ISA architecture is designed to increase the speed of generating test stimuli; ②based on the characteristics of the new construction language Chisel, designed for the processor model under test four verification stages: primary verification at the Chisel level, rapid verification of coverage, direct test verification and verification of complex applications, to ensure that the expected coverage is achieved; ③built in the Chisel environment and Verilog environment based on the embedded processor model Test platform. The test platform can quickly and accurately find errors and locate errors while collecting coverage, which improves the verification speed. Finally, the FPGA acceleration method is used to accelerate the verification of large-scale application programs and shorten the verification cycle. 随着航空硬件设计复杂度的提高, 芯片验证技术已经成为了芯片设计的难点。为了有效缩短设计流程的总体工作时间, 有必要在占据设计大量时间的验证中, 研究出快速寻找设计错误的方法。被测设计是兼容ARM V4指令集架构(instruction set architecture, ISA)的处理器模型ARMChisel, 该处理器模型采用新型的硬件语言Chisel构建, 是一个具有高复杂性的硬件设计。基于这一嵌入式处理器模型: ①设计了支持ARM V4 ISA架构全部指令的随机指令生成器, 提高了生成测试激励的速度; ②根据新型构建语言Chisel的特点, 针对被测处理器模型设计了Chisel层面初级验证、覆盖率快速验证、直接测试验证和复杂应用程序验证策略, 确保达到预期的覆盖率; ③在Chisel环境和Verilog环境中搭建了基于嵌入式处理器模型的测试平台, 测试平台收集覆盖率同时能快速准确地发现错误并定位错误, 提高了验证速度。采用FPGA(field programmable gute array)方法加速大型应用程序的验证, 缩短了验证周期。
Cooperative Computing Techniques for a Deeply Fused and Heterogeneous Many-Core Processor Architecture
Due to advances in semiconductor techniques, many-core processors have been widely used in high performance computing. However, many applications still cannot be carried out efficiently due to the memory wall, which has become a bottleneck in many-core processors. In this paper, we present a novel heterogeneous many-core processor architecture named deeply fused many-core (DFMC) for high performance computing systems. DFMC integrates management processing ele- ments (MPEs) and computing processing elements (CPEs), which are heterogeneous processor cores for different application features with a unified ISA (instruction set architecture), a unified execution model, and share-memory that supports cache coherence. The DFMC processor can alleviate the memory wall problem by combining a series of cooperative computing techniques of CPEs, such as multi-pattern data stream transfer, efficient register-level communication mechanism, and fast hardware synchronization technique. These techniques are able to improve on-chip data reuse and optimize memory access performance. This paper illustrates an implementation of a full system prototype based on FPGA with four MPEs and 256 CPEs. Our experimental results show that the effect of the cooperative computing techniques of CPEs is significant, with DGEMM (double-precision matrix multiplication) achieving an efficiency of 94%, FFT (fast Fourier transform) obtaining a performance of 207 GFLOPS and FDTD (finite-difference time-domain) obtaining a performance of 27 GFLOPS.
Drosha and Dicer: Slicers cut from the same cloth
DROSHA and its partner DGCR8 form a heterotrimeric complex named Microprocessor, which is essential for microRNA biogenesis. A recent study by Kwon et al. in Cell reveals the structure of a DROSHA construct in complex with the C-terminal region of DGCR8, thereby unveiling the topology and interactions between components of the Microprocessor and insights into its 'ruler'-based cleavage activity and function.
Darwin:a neuromorphic hardware co-processor based on Spiking Neural Networks
Broadly speaking, the goal of neuromorphic engineering is to build computer systems that mimic the brain. Spiking Neural Network(SNN) is a type of biologically-inspired neural networks that perform information processing based on discrete-time spikes, different from traditional Artificial Neural Network(ANN).Hardware implementation of SNNs is necessary for achieving high-performance and low-power. We present the Darwin Neural Processing Unit(NPU), a neuromorphic hardware co-processor based on SNN implemented with digitallogic, supporting a maximum of 2048 neurons, 20482= 4194304 synapses, and 15 possible synaptic delays.The Darwin NPU was fabricated by standard 180 nm CMOS technology with an area size of 5 × 5 mm2and70 MHz clock frequency at the worst case. It consumes 0.84 m W/MHz with 1.8 V power supply for typical applications. Two prototype applications are used to demonstrate the performance and efficiency of the hardware implementation.
g-good-neighbor conditional diagnosability of star graph networks under PMC model and MM model
Diagnosability of a multiprocessor system is an important study topic. S. L. Peng, C. K. Lin, J. J. M. Tan, and L. H. Hsu [Appl. Math. Comput., 2012, 218(21): 10406-10412] proposed a new measure for fault diagnosis of the system, which is called the g-good-neighbor conditional diagnosability that restrains every fault-free node containing at least g fault-free neighbors. As a famous topological structure of interconnection networks, the n-dimensional star graph S n has many good properties. In this paper, we establish the g-good-neighbor conditional diagnosability of S n under the PMC model and MM* model.
An Intra-Server Interconnect Fabric for Heterogeneous Computing
With the increasing diversity of application needs and computing units, the server with heterogeneous pro- cessors is more and more widespread. However, conventional SMP/ccNUMA server architecture introduces communication bottleneck between heterogeneous processors and only uses heterogeneous processors as coprocessors, which limits the efficiency and flexibility of using heterogeneous processors. To solve this problem, this paper proposes an intra-server inter- connect fabric that supports both intra-server peer-to-peer interconnection and I/O resource sharing among heterogeneous processors. By connecting processors and I/O devices with the proposed fabric, heterogeneous processors can perform direct communication with each other and run in stand-alone mode with shared intra-server resources. We design the proposed fabric by extending the de-facto system I/O bus protocol PCIe (Peripheral Computer Interconnect Express) and implement it with a single chip cZodiac. By making full use of PCIe's original advantages, the interconnection and the I/O sharing mechanism are light weight and efficient. Evaluations that have been carried out on both the FPGA (Field Programmable Gate Array) prototype and the cycle-accurate simulator demonstrate that our design is feasible and scalable. In addition, our design is suitable for not only the heterogeneous server but also the high density server.
Preface
Dataflow architecture is a kind of computer architecture that contrasts the traditional yon Neumann architecture or control flow architecture. Although it is not commercially successful in general-purpose computer processor market as yet, the concepts of dataflow have been used in many areas such as digital signal processing, network routing and scientific computing, as well as parallel computing frameworks.
High-speed visual target tracking with mixed rotation invariant description and skipping searching
This paper proposes a novel high-speed visual target tracking system based on mixed rotation invariant description(MRID) and skipping searching method. MRID is a novel rotation invariant description of texture and edge information by annular histograms and dominant direction. It overcomes rotation variant and large computation issues in conventional LBP-HOG feature description. The skipping searching method used in tracking can remarkably decrease the computation time by avoiding repeated searching operations.The proposed tracking system contains an image sensor, a hierarchical vision processor and an actuator with2 dimensions of freedom(DOF). The vision processor integrates processors with pixel-and row-level parallelism to speed up the tracking algorithm. Experiment results show that the proposed system can achieve over 1000-fps processing speed of the tracking algorithm under 750 × 480 resolution image.