Catalogue Search | MBRL

by Gutsche Oliver , Mason David , Lannon Kevin

2024

The HL-LHC run is anticipated to start at the end of this decade and will pose a significant challenge for the scale of the HEP software and computing infrastructure. The mission of the U.S. CMS Software & Computing Operations Program is to develop and operate the software and computing resources necessary to process CMS data expeditiously and to enable U.S. physicists to fully participate in the physics of CMS. We have developed a strategic plan to prioritize R&D efforts to reach this goal for the HL-LHC. This plan includes four grand challenges: modernizing physics software and improving algorithms, building infrastructure for exabyte-scale datasets, transforming the scientific data analysis process and transitioning from R&D to operations. We are involved in a variety of R&D projects that fall within these grand challenges. In this talk, we will introduce our four grand challenges and outline the R&D program of the U.S. CMS Software & Computing Operations Program.

Journal Article

Share this book

Add to My Shelf

Abstracting container technologies and transfer mechanisms in the Scalable CyberInfrastructure for Artificial Intelligence and Likelihood Free Inference (SCAILFIN) project

by Brenner, Paul , Hurtado Anampa, Kenyi , Kankel, Cody in Artificial intelligence , Containers , Inference

2020

High Performance Computing (HPC) facilities provide vast computational power and storage, but generally work on fixed environments designed to address the most common software needs locally, making it challenging for users to bring their own software. To overcome this issue, most HPC facilities have added support for HPC friendly container technologies such as Shifter, Singularity, or Charliecloud. These different container technologies are all compatible with the more popular Docker containers, however the implementation and use of said containers is different for each HPC friendly container technology. These usage differences can make it difficult for an end user to easily submit and utilize different HPC sites without making adjustments to their workflows and software. This issue is exacerbated when attempting to utilize workflow management software between different sites with differing container technologies. The SCAILFIN project aims to develop and deploy artificial intelligence (AI) and likelihood-free inference (LFI) techniques and software using scalable cyberinfrastructure (CI) that span multiple sites. The project has extended the CERN-based REANA framework, a platform designed to enable analysis reusability, and reproducibility while supporting different workflow engine languages, in order to support submission to different HPC facilities. The work presented here focuses on the development of an abstraction layer that allows the support of different container technologies and different transfer protocols for files and directories between the HPC facility and the REANA cluster edge service from the user’s workflow application.

Journal Article

Share this book

Add to My Shelf

The U.S. CMS HL-LHC R&D Strategic Plan

by Gray, Lindsey , Letts, James , Gutsche, Oliver in Algorithms , Computation , Data analysis

2024

The HL-LHC run is anticipated to start at the end of this decade and will pose a significant challenge for the scale of the HEP software and computing infrastructure. The mission of the U.S. CMS Software & Computing Operations Program is to develop and operate the software and computing resources necessary to process CMS data expeditiously and to enable U.S. physicists to fully participate in the physics of CMS. We have developed a strategic plan to prioritize R&D efforts to reach this goal for the HL-LHC. This plan includes four grand challenges: modernizing physics software and improving algorithms, building infrastructure for exabyte-scale datasets, transforming the scientific data analysis process and transitioning from R&D to operations. We are involved in a variety of R&D projects that fall within these grand challenges. In this talk, we will introduce our four grand challenges and outline the R&D program of the U.S. CMS Software & Computing Operations Program.

Journal Article

Share this book

Add to My Shelf

Scaling up a CMS tier-3 site with campus resources and a 100 Gb/s network connection: what could go wrong?

by Li, Wenzhao , Anampa, Kenyi Hurtado , Brenner, Paul in Computation , Physics , Wide area networks

2017

The University of Notre Dame (ND) CMS group operates a modest-sized Tier-3 site suitable for local, final-stage analysis of CMS data. However, through the ND Center for Research Computing (CRC), Notre Dame researchers have opportunistic access to roughly 25k CPU cores of computing and a 100 Gb/s WAN network link. To understand the limits of what might be possible in this scenario, we undertook to use these resources for a wide range of CMS computing tasks from user analysis through large-scale Monte Carlo production (including both detector simulation and data reconstruction.) We will discuss the challenges inherent in effectively utilizing CRC resources for these tasks and the solutions deployed to overcome them.

Journal Article

Share this book

Add to My Shelf

Opportunistic Computing with Lobster: Lessons Learned from Scaling up to 25k Non-Dedicated Cores

by Li, Wenzhao , Yannakopoulos, Anna , Anampa, Kenyi Hurtado in Communications systems , Monitoring , Optimization

2017

We previously described Lobster, a workflow management tool for exploiting volatile opportunistic computing resources for computation in HEP. We will discuss the various challenges that have been encountered while scaling up the simultaneous CPU core utilization and the software improvements required to overcome these challenges. Categories: Workflows can now be divided into categories based on their required system resources. This allows the batch queueing system to optimize assignment of tasks to nodes with the appropriate capabilities. Within each category, limits can be specified for the number of running jobs to regulate the utilization of communication bandwidth. System resource specifications for a task category can now be modified while a project is running, avoiding the need to restart the project if resource requirements differ from the initial estimates. Lobster now implements time limits on each task category to voluntarily terminate tasks. This allows partially completed work to be recovered. Workflow dependency specification: One workflow often requires data from other workflows as input. Rather than waiting for earlier workflows to be completed before beginning later ones, Lobster now allows dependent tasks to begin as soon as sufficient input data has accumulated. Resource monitoring: Lobster utilizes a new capability in Work Queue to monitor the system resources each task requires in order to identify bottlenecks and optimally assign tasks. The capability of the Lobster opportunistic workflow management system for HEP computation has been significantly increased. We have demonstrated efficient utilization of 25 000 non-dedicated cores and achieved a data input rate of 30 Gb/s and an output rate of 500GB/h. This has required new capabilities in task categorization, workflow dependency specification, and resource monitoring.

Journal Article

Share this book

Add to My Shelf

Exploiting volatile opportunistic computing resources with Lobster

by Anampa, Kenyi Hurtado , Brenner, Paul , Wolf, Matthias in Computation , Data storage , File servers

2015

Analysis of high energy physics experiments using the Compact Muon Solenoid (CMS) at the Large Hadron Collider (LHC) can be limited by availability of computing resources. As a joint effort involving computer scientists and CMS physicists at Notre Dame, we have developed an opportunistic workflow management tool, Lobster, to harvest available cycles from university campus computing pools. Lobster consists of a management server, file server, and worker processes which can be submitted to any available computing resource without requiring root access. Lobster makes use of the Work Queue system to perform task management, while the CMS specific software environment is provided via CVMFS and Parrot. Data is handled via Chirp and Hadoop for local data storage and XrootD for access to the CMS wide-area data federation. An extensive set of monitoring and diagnostic tools have been developed to facilitate system optimisation. We have tested Lobster using the 20 000-core cluster at Notre Dame, achieving approximately 8-10k tasks running simultaneously, sustaining approximately 9 Gbit s of input data and 340 Mbit s of output data.

Journal Article

Share this book

Add to My Shelf

Analysis Cyberinfrastructure: Challenges and Opportunities

by Brenner, Paul , Kenyi Hurtado Anampa , Thain, Doug in Data analysis , Histograms , Luminosity

2022

Analysis cyberinfrastructure refers to the combination of software and computer hardware used to support late-stage data analysis in High Energy Physics (HEP). For the purposes of this white paper, late-stage data analysis refers specifically to the step of transforming the most reduced common data format produced by a given experimental collaboration (for example, nanoAOD for the CMS experiment) into histograms. In this white paper, we reflect on observations gathered from a recent experience with data analysis using a recent, python-based analysis framework, and extrapolate these experiences though the High-Luminosity LHC era as way of highlighting potential R\\&D topics in analysis cyberinfrastructure.

Paper

Share this book

Add to My Shelf

The U.S. CMS HL-LHC R&D Strategic Plan

by Gray, Lindsey , Letts, James , Gutsche, Oliver in Algorithms , Computation , Data analysis

2023

The HL-LHC run is anticipated to start at the end of this decade and will pose a significant challenge for the scale of the HEP software and computing infrastructure. The mission of the U.S. CMS Software & Computing Operations Program is to develop and operate the software and computing resources necessary to process CMS data expeditiously and to enable U.S. physicists to fully participate in the physics of CMS. We have developed a strategic plan to prioritize R&D efforts to reach this goal for the HL-LHC. This plan includes four grand challenges: modernizing physics software and improving algorithms, building infrastructure for exabyte-scale datasets, transforming the scientific data analysis process and transitioning from R&D to operations. We are involved in a variety of R&D projects that fall within these grand challenges. In this talk, we will introduce our four grand challenges and outline the R&D program of the U.S. CMS Software & Computing Operations Program.

Paper

Share this book

Add to My Shelf

Snowmass 2013 Computing Frontier Storage and Data Management

by Mount, Richard , Butler, Michelle , Hildreth, Mike in Data management , Data storage , Energy management

2013

The data storage and data management needs are summarized for the energy frontier, intensity frontier, cosmic frontier, lattice field theory, perturbative QCD and accelerator science. The outlook for data storage technologies and costs is then outlined, followed by a summary of the current state of data, software and physics analysis capability preservation. The HEP outlook is summarized, pointing out where future data volumes may strain against what is technologically and financially feasible. Finally recommendations for areas of particular attention and action are made.

Paper

Share this book

Add to My Shelf

Status Report of the DPHEP Collaboration: A Global Effort for Sustainable Data Preservation in High Energy Physics

by Amerio, Silvia , Viljoen, Matthew , Barbera, Roberto in Accelerators , Collaboration , High energy physics

2016

Data from High Energy Physics (HEP) experiments are collected with significant financial and human effort and are mostly unique. An inter-experimental study group on HEP data preservation and long-term analysis was convened as a panel of the International Committee for Future Accelerators (ICFA). The group was formed by large collider-based experiments and investigated the technical and organizational aspects of HEP data preservation. An intermediate report was released in November 2009 addressing the general issues of data preservation in HEP and an extended blueprint paper was published in 2012. In July 2014 the DPHEP collaboration was formed as a result of the signature of the Collaboration Agreement by seven large funding agencies (others have since joined or are in the process of acquisition) and in June 2015 the first DPHEP Collaboration Workshop and Collaboration Board meeting took place. This status report of the DPHEP collaboration details the progress during the period from 2013 to 2015 inclusive.

Paper

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter