Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Discipline
      Discipline
      Clear All
      Discipline
  • Is Peer Reviewed
      Is Peer Reviewed
      Clear All
      Is Peer Reviewed
  • Series Title
      Series Title
      Clear All
      Series Title
  • Reading Level
      Reading Level
      Clear All
      Reading Level
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
      More Filters
      Clear All
      More Filters
      Content Type
    • Item Type
    • Is Full-Text Available
    • Subject
    • Publisher
    • Source
    • Donor
    • Language
    • Place of Publication
    • Contributors
    • Location
806 result(s) for "Hadoop"
Sort by:
Hadoop for dummies
Let Hadoop For Dummies help harness the power of your data and rein in the information overload. Big data has become big business, and companies and organizations of all sizes are struggling to find ways to retrieve valuable information from their massive data sets without becoming overwhelmed. Enter Hadoop and this easy-to-understand For Dummies guide. Hadoop For Dummies helps readers understand the value of big data, make a business case for using Hadoop, navigate the Hadoop ecosystem, and build and manage Hadoop applications and clusters.
Big Data: A Survey
In this paper, we review the background and state-of-the-art of big data. We first introduce the general background of big data and review related technologies, such as could computing, Internet of Things, data centers, and Hadoop. We then focus on the four phases of the value chain of big data, i.e., data generation, data acquisition, data storage, and data analysis. For each phase, we introduce the general background, discuss the technical challenges, and review the latest advances. We finally examine the several representative applications of big data, including enterprise management, Internet of Things, online social networks, medial applications, collective intelligence, and smart grid. These discussions aim to provide a comprehensive overview and big-picture to readers of this exciting area. This survey is concluded with a discussion of open problems and future directions.
Pro Hadoop data analytics : designing and building big data systems using the Hadoop ecosystem
\"Learn advanced analytical techniques and leverage existing toolkits to make your analytic applications more powerful, precise, and efficient. This book provides the right combination of architecture, design, and implementation information to create analytical systems which go beyond the basics of classification, clustering, and recommendation\" -- All IT eBooks website.
MapReduce model for efficient image retrieval: a Hadoop-based framework
The rapid proliferation of images, driven by advancements in image-capturing technologies, poses significant challenges to the efficient management and retrieval of images from vast databases. Traditional Content-Based Image Retrieval (CBIR) systems struggle with scalability and complexity, resulting in suboptimal retrieval performance. To address these challenges, this paper explores the integration of CBIR with Hadoop, a well-established distributed computing framework. Hadoop's capability to handle large-scale data processing, utilizing its MapReduce model and Hadoop Distributed File System (HDFS), offers a promising solution to enhance the effectiveness and scalability of image retrieval tasks. Our solution combines the strengths of CBIR and Hadoop, using our proposed Full Directional Local Neighbor Pattern (FDLNP) method for feature extraction. This method captures local patterns, color, texture, and directional information to provide a comprehensive image representation, significantly improving retrieval accuracy. We present a detailed design and implementation of this integrated system, emphasizing its two main phases: the offline phase, which constructs the feature database using FDLNP and MapReduce jobs, and the online phase, which extracts features from query images and calculates similarity distances using parallel processing. Experimental results on a Hadoop cluster reveal a significant improvement in processing efficiency, particularly for large datasets, highlighting the advantages of distributed processing in managing extensive image retrieval tasks. The findings indicate that while single-node systems may be suitable for smaller datasets, Hadoop clusters are preferable for larger-scale image databases due to their scalability and concurrent processing capabilities. Integrating CBIR with Hadoop, enhanced by our FDLNP method, provides a powerful tool for organizing and searching images in vast databases, offering improved retrieval performance, scalability, fault tolerance, and reliability—key attributes for large-scale image retrieval systems.
Big data with cloud computing: Discussions and challenges
With the recent advancements in computer technologies, the amount of data available is increasing day by day. However, excessive amounts of data create great challenges for users. Meanwhile, cloud computing services provide a powerful environment to store large volumes of data. They eliminate various requirements, such as dedicated space and maintenance of expensive computer hardware and software. Handling big data is a time-consuming task that requires large computational clusters to ensure successful data storage and processing. In this work, the definition, classification, and characteristics of big data are discussed, along with various cloud services, such as Microsoft Azure, Google Cloud, Amazon Web Services, International Business Machine cloud, Hortonworks, and MapR. A comparative analysis of various cloud-based big data frameworks is also performed. Various research challenges are defined in terms of distributed database storage, data security, heterogeneity, and data visualization.
Getting started with Kudu : perform fast analytics on fast data
\"Begun as an internal project at [the firm] Cloudera, Kudu is an open source solution compatible with many data processing frameworks in the Hadoop environment. In this book, current and former solutions professionals from Cloudera provide use cases, examples, best practices, and sample code\"--Page 4 of cover.
Hadoop in Banking: Event‐Driven Performance Evaluation
In today’s data‐intensive atmosphere, performance evaluation in the banking industry depends on timely and accurate insights, leading to better decision making and operational efficiency. Traditional methods for assessing bank performance often need to be improved to handle the volume, velocity, and variety of data generated in real time. This study proposes an event‐driven approach for performance evaluation in banking alongside a Hadoop‐based architecture. Infused with real‐time event analytics, this scalable framework can process and analyze fast‐moving transactional data. Hence, the framework allows banks to monitor key performance indicators and detect real‐time operational anomalies. This is supported by the Hadoop ecosystem, which provides distribution of the processing and storage, making it fit for handling large datasets with high fault tolerance and parallel computation levels. This study analyzes transaction and user engagement data using Hive queries, focusing on credit card transactions via MasterCard. Two cases are examined: a detailed snapshot of individual transactions and a five‐day trend analysis. Metrics like active users, card registrations, and retention are visualized through dashboards. Findings reveal user activity patterns and areas for improvement, emphasizing scalable, data‐driven approaches for transaction analytics. This framework conceives a functional approach for banks to exploit extensive data‐analytic capabilities to strive for competitive advantage and survivability of a business by adding any required metrics. The findings signify that the Hadoop‐integrated event‐driven analytics method could act as a game changer for performance evaluation in the banking sector.
Learning Apache Drill : query and analyze distributed data sources with SQL
Get up to speed with Apache Drill, an extensible distributed SQL query engine that reads massive datasets in many popular file formats such as Parquet, JSON, and CSV. Drill reads data in HDFS or in cloud-native storage such as S3 and works with Hive metastores along with distributed databases such as HBase, MongoDB, and relational databases. Drill works everywhere: on your laptop or in your largest cluster.In this practical book, Drill committers Charles Givre and Paul Rogers show analysts and data scientists how to query and analyze raw data using this powerful tool. Data scientists today spend about 80% of their time just gathering and cleaning data. With this book, you'll learn how Drill helps you analyze data more effectively to drive down time to insight. Use Drill to clean, prepare, and summarize delimited data for further analysis ; Query file types including logfiles, Parquet, JSON, and other complex formats ; Query Hadoop, relational databases, MongoDB, and Kafka with standard SQL ; Connect to Drill programmatically using a variety of languages ; Use Drill even with challenging or ambiguous file formats ; Perform sophisticated analysis by extending Drill's functionality with user-defined functions ; Facilitate data analysis for network security, image metadata, and machine learning
A Review on Big Data Optimization Techniques
Analysis of representative tools for SQL query processing on Hadoop (SQL-on-Hadoop systems), such as Hive, Impala, Presto, Shark, show that they are not still sufficiently efficient for complex analytical queries and interactive query processing. Existing SQL-on-Hadoop systems have many benefits from the application of modern query processing techniques that have been studied extensively for many years in the database community. It is expected that with the application of advanced techniques, the performance of SQL-on-Hadoop systems can be improved. The main idea of this paper is to give a review of big data concepts and technologies, and summarize big data optimization techniques that can be used for improving performance when processing big data.