Catalogue Search | MBRL

by Magalhaes, Arlino , Monteiro, Jose Maria , Brayner, Angelo in Computer Science , Database Management , Regular Paper

2024

Main memory databases (MMDBs) technology handles the primary database in Random Access Memory (RAM) to provide high throughput and low latency. However, volatile memory makes MMDBs much more sensitive to system failures. The contents of the database are lost in these failures, and, as a result, systems may be unavailable for a long time until the database recovery process has been finished. Therefore, novel recovery techniques are needed to repair crashed MMDBs as quickly as possible. This paper presents MM-DIRECT (Main Memory Database Instant RECovery with Tuple consistent checkpoint), a recovery technique that enables MMDBs to schedule transactions simultaneously with the database recovery process at system startup. Thus, it gives the impression that the database is instantly restored. The approach implements a tuple-level consistent checkpoint to reduce the recovery time. To validate the proposed approach, experiments were performed in a prototype implemented on the Redis database. The results show that the instant recovery technique effectively provides high transaction throughput rates even during the recovery process and normal database processing.

Journal Article

Share this book

Add to My Shelf

ESL: A High-Performance Skiplist with Express Lane

by Kim, Wook-Hee , Park, Jonghyeok , Park, Taeyoon in Critical path , in-memory data structure , in-memory database

2023

With the increasing capacity and cost-efficiency of DRAM in multi-core environments, in-memory databases have emerged as fundamental solutions for delivering high performance. The index structure is a crucial component of the in-memory database, which, leveraging fast access to DRAM, plays an important role in the performance improvement and scalability of in-memory databases. A skiplist is one of the most widely used in-memory index structures and it has been adopted by popular databases. However, skiplists suffer from poor performance due to their structural limitations. In this work, we propose ESL, a high-performance and scalable skiplist. ESL efficiently enhances the performance of traverse operations by optimizing index levels for the CPU cache. With CPU cache-optimized index levels, we synergistically leverage a combination of exponential and linear searches. In addition, ESL reduces synchronization overhead by updating the index levels asynchronously, while tolerating inconsistencies. In our YCSB evaluation, ESL improves throughput by up to 2.8× over other skiplists in high-level evaluations. ESL also shows lower tail latency than other skiplists by up to 35×. Also, ESL consistently shows higher throughput in our real-world workload evaluation.

Journal Article

Share this book

Add to My Shelf

CCA: Cost-Capacity-Aware Caching for In-Memory Data Analytics Frameworks

by Park, Seongsoo , Jeong, Minseop , Han, Hwansoo in big data analytics frameworks , caching optimization , Data analysis

2021

To process data from IoTs and wearable devices, analysis tasks are often offloaded to the cloud. As the amount of sensing data ever increases, optimizing the data analytics frameworks is critical to the performance of processing sensed data. A key approach to speed up the performance of data analytics frameworks in the cloud is caching intermediate data, which is used repeatedly in iterative computations. Existing analytics engines implement caching with various approaches. Some use run-time mechanisms with dynamic profiling and others rely on programmers to decide data to cache. Even though caching discipline has been investigated long enough in computer system research, recent data analytics frameworks still leave a room to optimize. As sophisticated caching should consider complex execution contexts such as cache capacity, size of data to cache, victims to evict, etc., no general solution often exists for data analytics frameworks. In this paper, we propose an application-specific cost-capacity-aware caching scheme for in-memory data analytics frameworks. We use a cost model, built from multiple representative inputs, and an execution flow analysis, extracted from DAG schedule, to select primary candidates to cache among intermediate data. After the caching candidate is determined, the optimal caching is automatically selected during execution even if the programmers no longer manually determine the caching for the intermediate data. We implemented our scheme in Apache Spark and experimentally evaluated our scheme on HiBench benchmarks. Compared to the caching decisions in the original benchmarks, our scheme increases the performance by 27% on sufficient cache memory and by 11% on insufficient cache memory, respectively.

Journal Article

Share this book

Add to My Shelf

A systematic review of in-memory database over multi-tenancy

by Shah, Arpita , Bhatt, Nikita

2024

The significant cost and time are essential to obtain a comprehensive response, the response time to a query across a peer-to-peer database is one of the most challenging issues. This is particularly exact when dealing with large-scale data processing, where the traditional approach of processing data on a single machine may not be sufficient. The need for a scalable, reliable, and secure data processing system is becoming increasingly important. Managing a single in-memory database instance for multiple tenants is often easier than managing separate databases for each tenant. The research work is focused on scalability with multi-tenancy and more efficiency with a faster querying performance using in-memory database approach. We compare the performance of a row-oriented approach and column-oriented approach on our benchmark human resources (HR) schema using Oracle TimesTen in-memory database. Also, we captured some of the key advantages on optimization dimension(s) are the traditional approach, late-materialization, compression and invisible join on column-store (c-store) and row-base. When compression and late materialization are enabled in a query set; it improves the overall performance of query sets. In particular, the paper aims to elucidate the motivations behind multi-tenant application requirements concerning the database engine and highlight major designs over in-memory database for the tenancy approach on cloud.

Journal Article

Share this book

Add to My Shelf

Optimizing performance of GATK workflows using Apache Arrow In-Memory data framework

by Al-Ars, Zaid , Hofstee, H. Peter , Ahmad, Tanveer in Algorithms , Animal Genetics and Genomics , Apache Arrow

2020

Background Immense improvements in sequencing technologies enable producing large amounts of high throughput and cost effective next-generation sequencing (NGS) data. This data needs to be processed efficiently for further downstream analyses. Computing systems need this large amounts of data closer to the processor (with low latency) for fast and efficient processing. However, existing workflows depend heavily on disk storage and access, to process this data incurs huge disk I/O overheads. Previously, due to the cost, volatility and other physical constraints of DRAM memory, it was not feasible to place large amounts of working data sets in memory. However, recent developments in storage-class memory and non-volatile memory technologies have enabled computing systems to place huge data in memory to process it directly from memory to avoid disk I/O bottlenecks. To exploit the benefits of such memory systems efficiently, proper formatted data placement in memory and its high throughput access is necessary by avoiding (de)-serialization and copy overheads in between processes. For this purpose, we use the newly developed Apache Arrow, a cross-language development framework that provides language-independent columnar in-memory data format for efficient in-memory big data analytics. This allows genomics applications developed in different programming languages to communicate in-memory without having to access disk storage and avoiding (de)-serialization and copy overheads. Implementation We integrate Apache Arrow in-memory based Sequence Alignment/Map (SAM) format and its shared memory objects store library in widely used genomics high throughput data processing applications like BWA-MEM, Picard and GATK to allow in-memory communication between these applications. In addition, this also allows us to exploit the cache locality of tabular data and parallel processing capabilities through shared memory objects. Results Our implementation shows that adopting in-memory SAM representation in genomics high throughput data processing applications results in better system resource utilization, low number of memory accesses due to high cache locality exploitation and parallel scalability due to shared memory objects. Our implementation focuses on the GATK best practices recommended workflows for germline analysis on whole genome sequencing (WGS) and whole exome sequencing (WES) data sets. We compare a number of existing in-memory data placing and sharing techniques like ramDisk and Unix pipes to show how columnar in-memory data representation outperforms both. We achieve a speedup of 4.85x and 4.76x for WGS and WES data, respectively, in overall execution time of variant calling workflows. Similarly, a speedup of 1.45x and 1.27x for these data sets, respectively, is achieved, as compared to the second fastest workflow. In some individual tools, particularly in sorting, duplicates removal and base quality score recalibration the speedup is even more promising. Availability The code and scripts used in our experiments are available in both container and repository form at: https://github.com/abs-tudelft/ArrowSAM .

Journal Article

Share this book

Add to My Shelf

Emerging Optical In‐Memory Computing Sensor Synapses Based on Low‐Dimensional Nanomaterials for Neuromorphic Networks

by Zeng, Zhongming , Mitrovic, Ivona Z. , Zhao, Chun in artificial intelligence , Behavior , Biomimetic materials

2022

Emerging optical synapses with in‐memory computing sensor (IMCS) performance are considered to be one of the most effective candidates to circumvent the bottleneck of the current Von Neumann structure while developing neuromorphic systems with higher effectiveness and lower energy consumption. Biomimetic properties of optical IMCS synapses in function and form indicate the higher requirements for utilized functional materials, such as stronger optical sensitivity and lower energy dissipation. Because of properties with high optical‐sensitivity efficiency and excellent electrical conductivity, low‐dimensional nanomaterials have received tremendous interest in modulating optical‐induced synaptic plasticity and emulating optical‐triggered neuromorphic activity of optical IMCS synapses. Herein, a comprehensive summary of optical IMCS synapses based on low‐dimensional nanomaterials is introduced systematically for the first time, including 0D, 1D, and 2D materials. In addition, the content of biomimetic synaptic characteristics, materials classification, operation mechanism, and neuromorphic applications of optical IMCS synapses based on low‐dimensional nanomaterials are also summarized in this work. At last, the challenges and outlook related to artificial optical IMCS synapses with low‐dimensional nanomaterials are provided. Herein, a comprehensive summary of optical in‐memory computing sensor synapses based on low‐dimensional nanomaterials is introduced systematically, including 0D, 1D, and 2D materials. Biological foundation of the research on bionic synaptic devices, including the electrical synapse (left) and chemical synapse (right) is presented here.

Journal Article

Share this book

Add to My Shelf

EA2-IMDG: Efficient Approach of Using an In-Memory Data Grid to Improve the Performance of Replication and Scheduling in Grid Environment Systems

by Guroob, Abdo H. in Algorithms , Big Data , Communication

2023

This paper proposes a novel approach, EA2-IMDG (Efficient Approach of Using an In-Memory Data Grid) to improve the performance of replication and scheduling in grid environment systems. Grid environments are widely used for distributed computing, but they are often faced with the challenge of high data access latency and poor scalability. By utilizing an in-memory data grid (IMDG), the aim is to significantly reduce the data access latency and improve the resource utilization of the system. The approach uses the IMDG to store data in RAM, instead of on disk, allowing for faster data retrieval and processing. The IMDG is used to distribute data across multiple nodes, which helps to reduce the risk of data bottlenecks and improve the scalability of the system. To evaluate the proposed approach, a series of experiments were conducted, and its performance was compared with two baseline approaches: a centralized database and a centralized file system. The results of the experiments show that the EA2-IMDG approach improves the performance of replication and scheduling tasks by up to 90% in terms of data access latency and 50% in terms of resource utilization, respectively. These results suggest that the EA2-IMDG approach is a promising solution for improving the performance of grid environment systems.

Journal Article

Share this book

Add to My Shelf

MDB-KCP: persistence framework of in-memory database with CRIU-based container checkpoint in Kubernetes

by Na, Ji-Hyun , Shin, Jae-hyuck , Noh, Seo-Young in Big Data , Checkpoint/restore , Checkpointing

2024

As the demand for container technology and platforms increases due to the efficiency of IT resources, various workloads are being containerized. Although there are efforts to integrate various workloads into Kubernetes, the most widely used container platform today, the nature of containers makes it challenging to support persistence for memory-centric workloads like in-memory databases. In this paper, we discuss the drawbacks of one of the persistence support methods used for in-memory databases in a Kubernetes environment, namely, the data snapshot. To address these issues, we propose a compromise solution of using container checkpoints. Through this approach, we can perform checkpointing without incurring additional memory usage due to CoW, which is a problem in fork-based data snapshots during snapshot creation. Additionally, container checkpointing induces up to 7.1 times less downtime compared to the main process-based data snapshot. Furthermore, during database recovery, it is possible to achieve up to 11.3 times faster recovery compared to the data snapshot method.

Journal Article

Share this book

Add to My Shelf

DAFuzz: data-aware fuzzing of in-memory data stores

by Yi, Siyu , Zhu, Fengming , Zeng, Yingpei in Analysis , Coverage-base fuzzing , Coverage-guided fuzzing

2023

Fuzzing has become an important method for finding vulnerabilities in software. For fuzzing programs expecting structural inputs, syntactic- and semantic-aware fuzzing approaches have been particularly proposed. However, they still cannot fuzz in-memory data stores sufficiently, since some code paths are only executed when the required data are available. In this article, we propose a data-aware fuzzing method, DAFuzz, which is designed by considering the data used during fuzzing. Specifically, to ensure different data-sensitive code paths are exercised, DAFuzz first loads different kinds of data into the stores before feeding fuzzing inputs. Then, when generating inputs, DAFuzz ensures the generated inputs are not only syntactically and semantically valid but also use the data correctly. We implement a prototype of DAFuzz based on Superion and use it to fuzz Redis and Memcached . Experiments show that DAFuzz covers 13~95% more edges than AFL, Superion, AFL++, and AFLN et , and discovers vulnerabilities over 2.7× faster. In total, we discovered four new vulnerabilities in Redis and Memcached . All the vulnerabilities were reported to developers and have been acknowledged and fixed.

Journal Article

Share this book

Add to My Shelf

Business value of in-memory technology – multiple-case study insights

by Österle, Hubert , Otto, Boris , Bärenfänger, Rieke in Big Data , Business , Case studies

2014

Purpose – The purpose of this paper is to assess the business value of in-memory computing (IMC) technology by analyzing its organizational impact in different application scenarios. Design/methodology/approach – This research applies a multiple-case study methodology analyzing five cases of IMC application scenarios in five large European industrial and service-sector companies. Findings – Results show that IMC can deliver business value in various applications ranging from advanced analytic insights to support of real-time processes. This enables higher-level organizational advantages like data-driven decision making, superior transparency of operations, and experience with Big Data technology. The findings are summarized in a business value generation model which captures the business benefits along with preceding enabling changes in the organizational environment. Practical implications – Results aid managers in identifying different application scenarios where IMC technology may generate value for their organizations from business and IT management perspectives. The research also sheds light on the socio-technical factors that influence the likelihood of success or failure of IMC initiatives. Originality/value – This research is among the first to model the business value creation process of in-memory technology based on insights from multiple implemented applications in different industries.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter