Catalogue Search | MBRL

Scale tests of the new DUNE data pipeline

by Timm, Steven in CERN , Data acquisition , Data storage

2024

In preparation for the second runs of the ProtoDUNE detectors at CERN (NP02 and NP04)[1], DUNE has established a new data pipeline for bringing the data from the EHN-1 experimental hall at CERN to primary tape storage at Fermilab and CERN, and then spreading it out to a distributed disk data store at many locations around the world. This system includes a new Ingest Daemon and a new Declaration Daemon. The Rucio[2] replica catalog, and FTS3 transport are used to transport all files. All file metadata is declared to the new MetaCat[3] metadata service. All of these new components have been successfully tested at a scale equal to the expected output of the detector data acquisition system (~2-4 GB/s), and the expected network bandwidth out of the experimental hall. We present the procedure that was used to test and the results of the test.

Journal Article

Share this book

Add to My Shelf

HEPCloud Operations at Fermilab—The First Five Years

by Knoepfel, Kyle , Smith, Nick , Timm, Steven in Machine learning , Provisioning , Quantum computing

2025

The HEPCloud Facility at Fermilab has now been in production operation for five years. This facility is a unified provisioning gateway to US high performance computing centers, including NERSC, OLCF, and ALCF, other large supercomputers run by the NSF, and commercial clouds. HEPCloud delivers hundreds of millions of core-hours yearly for CMS. HEPCloud also serves other Fermilab experiments including DUNE, Mu2e, Muon g-2, and NOvA. In this paper we present the practical considerations of operating a distributed facility such as HEPCloud. We also mention some of the interesting research and development that HEPCloud has been used for including GPU-based machine learning inference servers, and tests of Quantum Computing.

Journal Article

Share this book

Add to My Shelf

FTS3: Data Movement Service in containers deployed in OKD

by Holzman, Burt , Timm, Steven , Karavakis, Edward in Containers , Large Hadron Collider

2021

The File Transfer Service (FTS3) is a data movement service developed at CERN which is used to distribute the majority of the Large Hadron Collider’s data across the Worldwide LHC Computing Grid (WLCG) infrastructure. At Fermilab, we have deployed FTS3 instances for Intensity Frontier experiments (e.g. DUNE) to transfer data in America and Europe, using a container-based strategy. In this article we summarize our experience building docker images based on work from the SLATE project (slateci.io) and deployed in OKD, the community distribution of Red Hat OpenShift. Additionally, we discuss our method of certificate management and maintenance utilizing Kubernetes CronJobs. Finally, we also report on the configuration currently running at Fermilab.

Journal Article

Share this book

Add to My Shelf

vcluster: a framework for auto scalable virtual cluster system in heterogeneous clouds

by Timm, Steven C. , Noh, Seo-Young , Jang, Haengjin in Cloud computing , Clusters , Computer Communication Networks

2014

Cloud computing is an emerging technology and is being widely considered for resource utilization in various research areas. One of the main advantages of cloud computing is its flexibility in computing resource allocations. Many computing cycles can be ready in very short time and can be smoothly reallocated between tasks. Because of this, there are many private companies entering the new business of reselling their idle computing cycles. Research institutes have also started building their own cloud systems for their various research purposes. In this paper, we introduce a framework for virtual cluster system called vcluster which is capable of utilizing computing resources from heterogeneous clouds and provides a uniform view in computing resource management. vcluster is an IaaS (Infrastructure as a Service) based cloud resource management system. It distributes batch jobs to multiple clouds depending on the status of queue and system pool. The main design philosophy behind vcluster is cloud and batch system agnostic and it is achieved through plugins. This feature mitigates the complexity of integrating heterogeneous clouds. In the pilot system development, we use FermiCloud and Amazon EC2, which are a private and a public cloud system, respectively. In this paper, we also discuss the features and functionalities that must be considered in virtual cluster systems.

Journal Article

Share this book

Add to My Shelf

HPC resource integration into CMS Computing via HEPCloud

by Tiradani, Anthony , Aftab Khan, Farrukh , Gutsche, Oliver in Computing costs , Economic models , Large Hadron Collider

2019

The higher energy and luminosity from the LHC in Run 2 have put increased pressure on CMS computing resources. Extrapolating to even higher luminosities (and thus higher event complexities and trigger rates) beyond Run 3, it becomes clear that simply scaling up the the current model of CMS computing alone will become economically unfeasible. High Performance Computing (HPC) facilities, widely used in scientific computing outside of HEP, have the potential to help fill the gap. Here we describe the U.S.CMS efforts to integrate US HPC resources into CMS Computing via the HEPCloud project at Fermilab. We present advancements in our ability to use NERSC resources at scale and efforts to integrate other HPC sites as well. We present experience in the elastic use of HPC resources, quickly scaling up use when so required by CMS workflows. We also present performance studies of the CMS multi-threaded framework on both Haswell and KNL HPC resources.

Journal Article

Share this book

Add to My Shelf

HEPCloud, an Elastic Hybrid HEP Facility using an Intelligent Decision Support System

by Bejar, Jose Caballero , Moibenko, Alexander , Fuess, Stuart in Automation , Computational grids , Computing costs

2019

HEPCloud is rapidly becoming the primary system for provisioning compute resources for all Fermilab-affiliated experiments. In order to reliably meet the peak demands of the next generation of High Energy Physics experiments, Fermilab must plan to elastically expand its computational capabilities to cover the forecasted need. Commercial cloud and allocation-based High Performance Computing (HPC) resources both have explicit and implicit costs that must be considered when deciding when to provision these resources, and at which scale. In order to support such provisioning in a manner consistent with organizational business rules and budget constraints, we have developed a modular intelligent decision support system (IDSS) to aid in the automatic provisioning of resources spanning multiple cloud providers, multiple HPC centers, and grid computing federations. In this paper, we discuss the goals and architecture of the HEPCloud Facility, the architecture of the IDSS, and our early experience in using the IDSS for automated facility expansion both at Fermi and Brookhaven National Laboratory.

Journal Article

Share this book

Add to My Shelf

Hardware-accelerated inference for real-time gravitational-wave astronomy

by Coughlin, Michael , Katsavounidis, Erik , Nguyen, Tri in 639/33/34/2810 , 639/705/794 , 639/766/930/1032

2022

Computational demands in gravitational-wave astronomy are expected to at least double over the next five years. As kilometre-scale interferometers are brought to design sensitivity, real-time delivery of gravitational-wave alerts will become increasingly important to enable multimessenger follow-up. Here we discuss a novel implementation and deployment of deep learning inference for real-time data denoising and astrophysical source identification. This objective is accomplished using a generic inference-as-a-service model capable of adapting to the future needs of gravitational-wave data analysis. The implementation allows seamless incorporation of hardware accelerators and also enables the use of commercial or private as-a-service computing. Low-latency and offline computing in gravitational-wave astronomy addresses key challenges in scalability and reliability and provides a data analysis platform particularly optimized for deep learning applications. There is a growing need for data cleaning and source identification for gravitational-wave detectors in real time. A deep learning inference-as-a-service framework using off-the-shelf software and hardware can address these challenges in a scalable and reliable way.

Journal Article

Share this book

Add to My Shelf

vcluster: a framework for auto scalable virtual cluster system in heterogeneous clouds : Multimedia Computing for Industry

by TIMM, Steven C , HAENGJIN JANG , NOH, Seo-Young in Applied sciences , Computer science; control theory; systems , Computer systems and distributed systems. User interface

2014

Journal Article

Share this book

Add to My Shelf

Toward SVOPME, a Scalable Virtual Organization Privileges Management Environment

by Timm, Steven , Wang, Nanbor , Ananthan, Balamurali in Codification , Configurations , Consistency

2011

Grids enable uniform access to resources by implementing standard interfaces to resource gateways. In the Open Science Grid (OSG), privileges are granted on the basis of the user's membership to a Virtual Organization (VO). However, user privilege definitions and enforcements are administered separately by VOs and Grid sites. Such partitioning can potentially introduce inconsistent user privileges throughout the Grid and break the Grid paradigm of uniform access to resources. There is a need for an automated privilege management mechanism for a VO to codify privilege policies granted to its users, to propagate the policies to grid sites, to identity and suggest remedies for non-supported VO privileges at individual sites. The Scalable Virtual Organization Privileges Management Environment (SVOPME) addresses the challenge under the context of the Open Science Grid (OSG). The SVOPME provides tools for VOs to define and publish desired privileges. At a site, SVOPME tools help analyze access policies defined for VO users and verify policy consistency between VOs and sites, and suggest site configurations changes. This paper presents the designs and features of SVOPME tools and the lessons learned in applying SVOPME tools for OSG VOs and sites. Furthermore, we will outline future improvements to SVOPME tools to adapt to a range of different site configurations and new privilege policies.

Journal Article

Share this book

Add to My Shelf

Hardware-accelerated Inference for Real-Time Gravitational-Wave Astronomy

by Coughlin, Michael , Katsavounidis, Erik , Nguyen, Tri in Accelerators , Astronomy , Binary stars

2021

The field of transient astronomy has seen a revolution with the first gravitational-wave detections and the arrival of multi-messenger observations they enabled. Transformed by the first detection of binary black hole and binary neutron star mergers, computational demands in gravitational-wave astronomy are expected to grow by at least a factor of two over the next five years as the global network of kilometer-scale interferometers are brought to design sensitivity. With the increase in detector sensitivity, real-time delivery of gravitational-wave alerts will become increasingly important as an enabler of multi-messenger followup. In this work, we report a novel implementation and deployment of deep learning inference for real-time gravitational-wave data denoising and astrophysical source identification. This is accomplished using a generic Inference-as-a-Service model that is capable of adapting to the future needs of gravitational-wave data analysis. Our implementation allows seamless incorporation of hardware accelerators and also enables the use of commercial or private (dedicated) as-a-service computing. Based on our results, we propose a paradigm shift in low-latency and offline computing in gravitational-wave astronomy. Such a shift can address key challenges in peak-usage, scalability and reliability, and provide a data analysis platform particularly optimized for deep learning applications. The achieved sub-millisecond scale latency will also be relevant for any machine learning-based real-time control systems that may be invoked in the operation of near-future and next generation ground-based laser interferometers, as well as the front-end collection, distribution and processing of data from such instruments.

Paper

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter