Catalogue Search | MBRL

Hadoop for dummies

by DeRoos, Dirk, author , Zikopoulos, Paul, author , Brown, Bruce, author in Apache Hadoop. , File organization (Computer science)

Let Hadoop For Dummies help harness the power of your data and rein in the information overload. Big data has become big business, and companies and organizations of all sizes are struggling to find ways to retrieve valuable information from their massive data sets without becoming overwhelmed. Enter Hadoop and this easy-to-understand For Dummies guide. Hadoop For Dummies helps readers understand the value of big data, make a business case for using Hadoop, navigate the Hadoop ecosystem, and build and manage Hadoop applications and clusters.

Book

Share this book

Add to My Shelf

Big Data: A Survey

by Chen, Min , Mao, Shiwen , Liu, Yunhao in Big Data , Cloud computing , Communications Engineering

2014

In this paper, we review the background and state-of-the-art of big data. We first introduce the general background of big data and review related technologies, such as could computing, Internet of Things, data centers, and Hadoop. We then focus on the four phases of the value chain of big data, i.e., data generation, data acquisition, data storage, and data analysis. For each phase, we introduce the general background, discuss the technical challenges, and review the latest advances. We finally examine the several representative applications of big data, including enterprise management, Internet of Things, online social networks, medial applications, collective intelligence, and smart grid. These discussions aim to provide a comprehensive overview and big-picture to readers of this exciting area. This survey is concluded with a discussion of open problems and future directions.

Journal Article

Share this book

Add to My Shelf

Pro Hadoop data analytics : designing and building big data systems using the Hadoop ecosystem

by Koitzsch, Kerry, author in Apache Hadoop. , Database management. , Open source software.

\"Learn advanced analytical techniques and leverage existing toolkits to make your analytic applications more powerful, precise, and efficient. This book provides the right combination of architecture, design, and implementation information to create analytical systems which go beyond the basics of classification, clustering, and recommendation\" -- All IT eBooks website.

Book

Share this book

Add to My Shelf

MapReduce model for efficient image retrieval: a Hadoop-based framework

by Alrahhal, Maher , Shukla, Vinod Kumar in Accuracy , Algorithms , Artificial Intelligence

2025

The rapid proliferation of images, driven by advancements in image-capturing technologies, poses significant challenges to the efficient management and retrieval of images from vast databases. Traditional Content-Based Image Retrieval (CBIR) systems struggle with scalability and complexity, resulting in suboptimal retrieval performance. To address these challenges, this paper explores the integration of CBIR with Hadoop, a well-established distributed computing framework. Hadoop's capability to handle large-scale data processing, utilizing its MapReduce model and Hadoop Distributed File System (HDFS), offers a promising solution to enhance the effectiveness and scalability of image retrieval tasks. Our solution combines the strengths of CBIR and Hadoop, using our proposed Full Directional Local Neighbor Pattern (FDLNP) method for feature extraction. This method captures local patterns, color, texture, and directional information to provide a comprehensive image representation, significantly improving retrieval accuracy. We present a detailed design and implementation of this integrated system, emphasizing its two main phases: the offline phase, which constructs the feature database using FDLNP and MapReduce jobs, and the online phase, which extracts features from query images and calculates similarity distances using parallel processing. Experimental results on a Hadoop cluster reveal a significant improvement in processing efficiency, particularly for large datasets, highlighting the advantages of distributed processing in managing extensive image retrieval tasks. The findings indicate that while single-node systems may be suitable for smaller datasets, Hadoop clusters are preferable for larger-scale image databases due to their scalability and concurrent processing capabilities. Integrating CBIR with Hadoop, enhanced by our FDLNP method, provides a powerful tool for organizing and searching images in vast databases, offering improved retrieval performance, scalability, fault tolerance, and reliability—key attributes for large-scale image retrieval systems.

Journal Article

Share this book

Add to My Shelf

Using R to unlock the value of big data : big data analytics with Oracle R Enterprise and Oracle R Connector for Hadoop

by Hornick, Mark F , Plunkett, Tom, 1967- in Oracle (Computer file) , Apache Hadoop , Database management.

Book

Share this book

Add to My Shelf

Big data with cloud computing: Discussions and challenges

by Sandhu, Amanpreet Kaur in Batch processing , Big Data , Business machines

2022

With the recent advancements in computer technologies, the amount of data available is increasing day by day. However, excessive amounts of data create great challenges for users. Meanwhile, cloud computing services provide a powerful environment to store large volumes of data. They eliminate various requirements, such as dedicated space and maintenance of expensive computer hardware and software. Handling big data is a time-consuming task that requires large computational clusters to ensure successful data storage and processing. In this work, the definition, classification, and characteristics of big data are discussed, along with various cloud services, such as Microsoft Azure, Google Cloud, Amazon Web Services, International Business Machine cloud, Hortonworks, and MapR. A comparative analysis of various cloud-based big data frameworks is also performed. Various research challenges are defined in terms of distributed database storage, data security, heterogeneity, and data visualization.

Journal Article

Share this book

Add to My Shelf

Getting started with Kudu : perform fast analytics on fast data

by Spaggiari, Jean-Marc, author , Kovacevic, Mladen (Software developer), author , Noland, Brock, author in Apache Hadoop. , Electronic data processing Distributed processing. , Big data.

\"Begun as an internal project at [the firm] Cloudera, Kudu is an open source solution compatible with many data processing frameworks in the Hadoop environment. In this book, current and former solutions professionals from Cloudera provide use cases, examples, best practices, and sample code\"--Page 4 of cover.

Book

Share this book

Add to My Shelf

Hadoop in Banking: Event‐Driven Performance Evaluation

by Garnayak, Mamata , Ray, Mitrabinda , Panda, Monalisa in Activity patterns , Banking , Banking industry

2025

In today’s data‐intensive atmosphere, performance evaluation in the banking industry depends on timely and accurate insights, leading to better decision making and operational efficiency. Traditional methods for assessing bank performance often need to be improved to handle the volume, velocity, and variety of data generated in real time. This study proposes an event‐driven approach for performance evaluation in banking alongside a Hadoop‐based architecture. Infused with real‐time event analytics, this scalable framework can process and analyze fast‐moving transactional data. Hence, the framework allows banks to monitor key performance indicators and detect real‐time operational anomalies. This is supported by the Hadoop ecosystem, which provides distribution of the processing and storage, making it fit for handling large datasets with high fault tolerance and parallel computation levels. This study analyzes transaction and user engagement data using Hive queries, focusing on credit card transactions via MasterCard. Two cases are examined: a detailed snapshot of individual transactions and a five‐day trend analysis. Metrics like active users, card registrations, and retention are visualized through dashboards. Findings reveal user activity patterns and areas for improvement, emphasizing scalable, data‐driven approaches for transaction analytics. This framework conceives a functional approach for banks to exploit extensive data‐analytic capabilities to strive for competitive advantage and survivability of a business by adding any required metrics. The findings signify that the Hadoop‐integrated event‐driven analytics method could act as a game changer for performance evaluation in the banking sector.

Journal Article

Share this book

Add to My Shelf

Learning Apache Drill : query and analyze distributed data sources with SQL

by Givre, Charles, author , Rogers, Paul, author in Apache Hadoop. , Apache Drill. , SQL (Computer program language)

Get up to speed with Apache Drill, an extensible distributed SQL query engine that reads massive datasets in many popular file formats such as Parquet, JSON, and CSV. Drill reads data in HDFS or in cloud-native storage such as S3 and works with Hive metastores along with distributed databases such as HBase, MongoDB, and relational databases. Drill works everywhere: on your laptop or in your largest cluster.In this practical book, Drill committers Charles Givre and Paul Rogers show analysts and data scientists how to query and analyze raw data using this powerful tool. Data scientists today spend about 80% of their time just gathering and cleaning data. With this book, you'll learn how Drill helps you analyze data more effectively to drive down time to insight. Use Drill to clean, prepare, and summarize delimited data for further analysis ; Query file types including logfiles, Parquet, JSON, and other complex formats ; Query Hadoop, relational databases, MongoDB, and Kafka with standard SQL ; Connect to Drill programmatically using a variety of languages ; Use Drill even with challenging or ambiguous file formats ; Perform sophisticated analysis by extending Drill's functionality with user-defined functions ; Facilitate data analysis for network security, image metadata, and machine learning

Book

Share this book

Add to My Shelf

A Review on Big Data Optimization Techniques

by Sarajlić, Nermin , Nerić, Vedrana

2020

Analysis of representative tools for SQL query processing on Hadoop (SQL-on-Hadoop systems), such as Hive, Impala, Presto, Shark, show that they are not still sufficiently efficient for complex analytical queries and interactive query processing. Existing SQL-on-Hadoop systems have many benefits from the application of modern query processing techniques that have been studied extensively for many years in the database community. It is expected that with the application of advanced techniques, the performance of SQL-on-Hadoop systems can be improved. The main idea of this paper is to give a review of big data concepts and technologies, and summarize big data optimization techniques that can be used for improving performance when processing big data.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter