Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
331
result(s) for
"Schmidt, Andrew G."
Sort by:
HwPMI : An Extensible Performance Monitoring Infrastructure for Improving Hardware Design and Productivity on FPGAs
2012
Designing hardware cores for FPGAs can quickly become a complicated task, difficult even for experienced engineers. With the addition of more sophisticated development tools and maturing high-level language-to-gates techniques, designs can be rapidly assembled; however, when the design is evaluated on the FPGA, the performance may not be what was expected. Therefore, an engineer may need to augment the design to include performance monitors to better understand the bottlenecks in the system or to aid in the debugging of the design. Unfortunately, identifying what to monitor and adding the infrastructure to retrieve the monitored data can be a challenging and time-consuming task. Our work alleviates this effort. We present the Hardware Performance Monitoring Infrastructure (HwPMI), which includes a collection of software tools and hardware cores that can be used to profile the current design, recommend and insert performance monitors directly into the HDL or netlist, and retrieve the monitored data with minimal invasiveness to the design. Three applications are used to demonstrate and evaluate HwPMI’s capabilities. The results are highly encouraging as the infrastructure adds numerous capabilities while requiring minimal effort by the designer and low resource overhead to the existing design.
Journal Article
An Evaluation of an Integrated On-Chip/Off-Chip Network for High-Performance Reconfigurable Computing
by
Sass, Ron
,
Kritikos, William V.
,
Schmidt, Andrew G.
in
Advantages
,
Communication systems
,
Communities
2012
As the number of cores per discrete integrated circuit (IC) device grows, the importance of the network on chip (NoC) increases. However, the body of research in this area has focused on discrete IC devices alone which may or may not serve the high-performance computing community which needs to assemble many of these devices into very large scale, parallel computing machines. This paper describes an integrated on-chip/off-chip network that has been implemented on an all-FPGA computing cluster. The system supports MPI-style point-to-point messages, collectives, and other novel communication. Results include the resource utilization and performance (in latency and bandwidth).
Journal Article
A Hardware Filesystem Implementation with Multidisk Support
by
Sass, Ron
,
Mendon, Ashwin A.
,
Schmidt, Andrew G.
in
Design
,
Field programmable gate arrays
,
Information storage
2009
Modern High-End Computing systems frequently include FPGAs as compute accelerators. These programmable logic devices now support disk controller IP cores which offer the ability to introduce new, innovative functionalities that, previously, were not practical. This article describes one such innovation: a filesystem implemented in hardware. This has the potential of improving the performance of data-intensive applications by connecting secondary storage directly to FPGA compute accelerators. To test the feasibility of this idea, a Hardware Filesystem was designed with four basic operations (open, read, write, and delete). Furthermore, multi-disk and RAID-0 (striping) support has been implemented as an option in the filesystem. A RAM Disk core was created to emulate a SATA disk drive so results on running FPGA systems could be readily measured. By varying the block size from 64 to 4096 bytes, it was found that 1024 bytes gave the best performance while using a very modest 7% of a Xilinx XC4VFX60's slices and only four (of the 232) BRAM blocks available.
Journal Article
Redsharc : A Programming Model and On-Chip Network for Multi-Core Systems on a Programmable Chip
2012
The reconfigurable data-stream hardware software architecture (Redsharc) is a programming model and network-on-a-chip solution designed to scale to meet the performance needs of multi-core Systems on a programmable chip (MCSoPC). Redsharc uses an abstract API that allows programmers to develop systems of simultaneously executing kernels, in software and/or hardware, that communicate over a seamless interface. Redsharc incorporates two on-chip networks that directly implement the API to support high-performance systems with numerous hardware kernels. This paper documents the API, describes the common infrastructure, and quantifies the performance of a complete implementation. Furthermore, the overhead, in terms of resource utilization, is reported along with the ability to integrate hard and soft processor cores with purely hardware kernels being demonstrated.
Journal Article
Research and development satellite account update: estimates for 1959-2007
2010
The Bureau of Economic Analysis (BEA) research and development (R&D) satellite account provides detailed statistics designed to facilitate research into the effects of R&D on the economy. The account shows how gross domestic product (GDP) and other measures would be affected if R&D spending were \"capitalized,\" that is, if R&D spending were treated as investment rather than as an expense. This update of the R&D satellite account extends BEA's estimates of the effects of R&D on economic growth through 2007, and it now includes coverage of the business cycle expansion that ended in December 2007. If R&D spending were treated as investment, real GDP would have grown slightly faster, on average, from 1959 to 2007. The average difference was 0.13 percentage point for 1959 to 1973. The picture of the economy presented in the revised R&D estimates is similar to that shown by the estimates published in 2007.
Journal Article
Radiation Hardening by Software Techniques on FPGAs: Flight Experiment Evaluation and Results
2017
We present our work on implementing Radiation Hardening by Software (RHBSW) techniques on the Xilinx Virtex5 FPGAs PowerPC 440 processors on the SpaceCube 2.0 platform. The techniques have been matured and tested through simulation modeling, fault emulation, laser fault injection and now in a flight experiment, as part of the Space Test Program- Houston 4-ISS SpaceCube Experiment 2.0 (STP-H4-ISE 2.0). This work leverages concepts such as heartbeat monitoring, control flow assertions, and checkpointing, commonly used in the High Performance Computing industry, and adapts them for use in remote sensing embedded systems. These techniques are extremely low overhead (typically <1.3%), enabling a 3.3x gain in processing performance as compared to the equivalent traditionally radiation hardened processor. The recently concluded STP-H4 flight experiment was an opportunity to upgrade the RHBSW techniques for the Virtex5 FPGA and demonstrate them on-board the ISS to achieve TRL 7. This work details the implementation of the RHBSW techniques, that were previously developed for the Virtex4-based SpaceCube 1.0 platform, on the Virtex5-based SpaceCube 2.0 flight platform. The evaluation spans the development and integration with flight software, remotely uploading the new experiment to the ISS SpaceCube 2.0 platform, and conducting the experiment continuously for 16 days before the platform was decommissioned. The experiment was conducted on two PowerPCs embedded within the Virtex5 FPGA devices and the experiment collected 19,400 checkpoints, processed 253,482 status messages, and incurred 0 faults. These results are highly encouraging and future work is looking into longer duration testing as part of the STP-H5 flight experiment.
Conference Proceeding
A Hardware Filesystem Implementation with Multidisk Support
2009
Modern High-End Computing systems frequently include FPGAs as compute accelerators. These programmable logic devices now support disk controller IP cores which offer the ability to introduce new, innovative functionalities that, previously, were not practical. This article describes one such innovation: a filesystem implemented in hardware. This has the potential of improving the performance of data-intensive applications by connecting secondary storage directly to FPGA compute accelerators. To test the feasibility of this idea, a Hardware Filesystem was designed with four basic operations (open, read, write, and delete). Furthermore, multi-disk and RAID-0 (striping) support has been implemented as an option in the filesystem. A RAM Disk core was created to emulate a SATA disk drive so results on running FPGA systems could be readily measured. By varying the block size from 64 to 4096 bytes, it was found that 1024 bytes gave the best performance while using a very modest 7% of a Xilinx XC4VFX60's slices and only four (of the 232) BRAM blocks available.
Journal Article
Evaluating Rapid Application Development with Python for Heterogeneous Processor-based FPGAs
by
French, Matthew
,
Weisz, Gabriel
,
Schmidt, Andrew G
in
Accelerators
,
Field programmable gate arrays
,
Hardware
2017
As modern FPGAs evolve to include more het- erogeneous processing elements, such as ARM cores, it makes sense to consider these devices as processors first and FPGA accelerators second. As such, the conventional FPGA develop- ment environment must also adapt to support more software- like programming functionality. While high-level synthesis tools can help reduce FPGA development time, there still remains a large expertise gap in order to realize highly performing implementations. At a system-level the skill set necessary to integrate multiple custom IP hardware cores, interconnects, memory interfaces, and now heterogeneous processing elements is complex. Rather than drive FPGA development from the hardware up, we consider the impact of leveraging Python to ac- celerate application development. Python offers highly optimized libraries from an incredibly large developer community, yet is limited to the performance of the hardware system. In this work we evaluate the impact of using PYNQ, a Python development environment for application development on the Xilinx Zynq devices, the performance implications, and bottlenecks associated with it. We compare our results against existing C-based and hand-coded implementations to better understand if Python can be the glue that binds together software and hardware developers.
Quantifying effective memory bandwidth of platform FPGAs
2007
The Reconfigurable Computing Cluster aims to use Platform FPGAs to determine the feasibility of building a cluster capable of scaling to the PetaFLOPs range. Platform FPGAs were chosen for their ability to host entire systems on a single chip. This thesis investigates several common bus architectures common to Platform FPGAs in order to measure the effective bandwidth between High Performance Computing cores and off-chip memory. Specifically three memory access patterns are implemented: random, strided, and sequential. Sequential can be further broken down into burst and non-burst mode transaction styles. In addition the effect of multiple cores on, not only the system, but each hardware core is studied. The results show that while some bus architectures are clearly better than others, none approach the theoretical bandwidth of the memory interface. Furthermore, negotiating the bus protocol is a significant source of overhead. So much so that it effectively hides any performance one might gain from trying to access the off-chip SDRAMs using an \"intelligent\" access pattern. Finally, burst mode (naturally) gives the highest performance, but unless the HPC core knows in advance that several sequential values are needed, non-burst sequential transaction accesses are no faster than strided or random.
Dissertation
SpaceCubeX: A Framework for Evaluating Hybrid Multi-Core CPU FPGA DSP Architectures
by
French, Matthew
,
Villalpando, Carlos Y
,
Weisz, Gabriel
in
Air quality
,
Central processing units
,
Climate change
2017
The SpaceCubeX project is motivated by the need for high performance, modular, and scalable on-board processing to help scientists answer critical 21st century questions about global climate change, air quality, ocean health, and ecosystem dynamics, while adding new capabilities such as low-latency data products for extreme event warnings. These goals translate into on-board processing throughput requirements that are on the order of 100-1,000 more than those of previous Earth Science missions for standard processing, compression, storage, and downlink operations. To study possible future architectures to achieve these performance requirements, the SpaceCubeX project provides an evolvable testbed and framework that enables a focused design space exploration of candidate hybrid CPU/FPGA/DSP processing architectures. The framework includes ArchGen, an architecture generator tool populated with candidate architecture components, performance models, and IP cores, that allows an end user to specify the type, number, and connectivity of a hybrid architecture. The framework requires minimal extensions to integrate new processors, such as the anticipated High Performance Spaceflight Computer (HPSC), reducing time to initiate benchmarking by months. To evaluate the framework, we leverage a wide suite of high performance embedded computing benchmarks and Earth science scenarios to ensure robust architecture characterization. We report on our projects Year 1 efforts and demonstrate the capabilities across four simulation testbed models, a baseline SpaceCube 2.0 system, a dual ARM A9 processor system, a hybrid quad ARM A53 and FPGA system, and a hybrid quad ARM A53 and DSP system.
Conference Proceeding