Catalogue Search | MBRL

PySpark recipes : a problem-solution approach with PySpark2

by Mishra, Raju Kumar, author in Python (Computer program language) , SPARK (Computer program language)

Quickly find solutions to common programming problems encountered while processing big data. Content is presented in the popular problem-solution format. Look up the programming problem that you want to solve. Read the solution. Apply the solution directly in your own code. Problem solved! PySpark Recipes covers Hadoop and its shortcomings. The architecture of Spark, PySpark, and RDD are presented. You will learn to apply RDD to solve day-to-day big data problems. Python and NumPy are included and make it easy for new learners of PySpark to understand and adopt the model. What You Will Learn: Understand the advanced features of PySpark and SparkSQL Optimize your code Program SparkSQL with Python Use Spark Streaming and Spark MLlib with Python Perform graph analysis with GraphFrames.

Book

Share this book

Add to My Shelf

Machine Learning with Spark™ and Python® - Essential Techniques for Predictive Analytics (2nd Edition)

by Bowles, Michael in General Engineering & Project Administration , General References , Machine learning

2020,2019

This book simplifies ML for practical uses by focusing on two key algorithms. This new second edition improves with the addition of Spark-a ML framework from the Apache foundation. By implementing Spark, machine learning students can easily process much large data sets and call the spark algorithms using ordinary Python code. This book focuses on two algorithm families (linear methods and ensemble methods) that effectively predict outcomes. This type of problem covers many use cases such as what ad to place on a web page, predicting prices in securities markets, or detecting credit card fraud. The focus on two families gives enough room for full descriptions of the mechanisms at work in the algorithms. Then the code examples serve to illustrate the workings of the machinery with specific hackable code.

eBook

Share this book

Add to My Shelf

Machine learning with PySpark : with natural language processing and recommender systems

by Singh, Pramod, 1954- author in Machine learning. , Application software Development. , Python (Computer program language)

Book

Share this book

Add to My Shelf

PySpark Cookbook

by Drabas, Tomasz , Lee, Denny in Application software-Development , COM018000 COMPUTERS / Data Processing , COMPUTERS / Data Modeling & Design

2018

This cookbook presents recipes on leveraging the power of Python and putting it to use in the Apache Spark ecosystem. By the end of this book, you will be able to solve any problem associated with building effective, data-intensive applications and performing machine learning and structured streaming using PySpark.

eBook

Share this book

Add to My Shelf

Practical predictive analytics : back to the future with R, Spark, and more!

by Winters, Ralph, author in Spark (Electronic resource : Apache Software Foundation) , R (Computer program language) , Data mining Data processing.

Book

Share this book

Add to My Shelf

Introducción a Apache Spark

by Macías, Mario in Computer systems , SPARK (Computer program language)

2015

eBook

Share this book

Add to My Shelf

Hands-on data science and Python machine learning : perform data mining and machine learning efficiently using Python and Spark

by Kane, Frank in Spark (Electronic resource : Apache Software Foundation) , Python (Computer program language) , Machine learning.

Book

Share this book

Add to My Shelf

Apache Spark : streaming with Python and PySpark

by McAteer, Matthew P in Big data , Instructional films , Python (Computer program language)

2018

Spark Streaming is becoming incredibly popular, and with good reason. According to IBM, 90% of the data in the World today was created in the last two years alone. Our current output of data is roughly 2.5 quintillion bytes per day. The World is being immersed in data, more so each and every day. As such, analyzing static DataFrames for non-dynamic data is becoming less and less of a practical approach to more and more problems. This is where data streaming comes in, the ability to process data almost as soon as it's produced, recognizing the time-dependency of the data. Apache Spark Streaming gives us an unlimited ability to build cutting-edge applications. It is also one of the most compelling technologies of the last decade in terms of its disruption in the big data world. Spark provides in-memory cluster computing, which greatly boosts the speed of iterative algorithms and interactive data mining tasks. Spark also is a powerful engine for streaming data as well as processing it. The synergy between them makes Spark an ideal tool for processing gargantuan data fire hoses. Tons of companies, including Fortune 500 companies, are adapting Apache Spark Streaming to extract meaning from massive data streams; today, you have access to that same big data technology right on your desktop. This Apache Spark Streaming course is taught in Python. Python is currently one of the most popular programming languages in the World! Its rich data community, offering vast amounts of toolkits and features, makes it a powerful tool for data processing. Using PySpark (the Python API for Spark), you will be able to interact with Apache Spark Streaming's main abstraction, RDDs, as well as other Spark components, such as Spark SQL and much more! Let's learn how to write Apache Spark Streaming programs with PySpark Streaming to process big data sources today!

Streaming Video

Share this book

Add to My Shelf

Efficient processing of complex XSD using Hive and Spark

by Luján-Mora, Sergio , Martinez-Mosquera, Diana , Navarrete, Rosa in Algorithms and Analysis of Algorithms , Big Data , Case studies

2021

The eXtensible Markup Language (XML) files are widely used by the industry due to their flexibility in representing numerous kinds of data. Multiple applications such as financial records, social networks, and mobile networks use complex XML schemas with nested types, contents, and/or extension bases on existing complex elements or large real-world files. A great number of these files are generated each day and this has influenced the development of Big Data tools for their parsing and reporting, such as Apache Hive and Apache Spark. For these reasons, multiple studies have proposed new techniques and evaluated the processing of XML files with Big Data systems. However, a more usual approach in such works involves the simplest XML schemas, even though, real data sets are composed of complex schemas. Therefore, to shed light on complex XML schema processing for real-life applications with Big Data tools, we present an approach that combines three techniques. This comprises three main methods for parsing XML files: cataloging, deserialization, and positional explode. For cataloging, the elements of the XML schema are mapped into root, arrays, structures, values, and attributes. Based on these elements, the deserialization and positional explode are straightforwardly implemented. To demonstrate the validity of our proposal, we develop a case study by implementing a test environment to illustrate the methods using real data sets provided from performance management of two mobile network vendors. Our main results state the validity of the proposed method for different versions of Apache Hive and Apache Spark, obtain the query execution times for Apache Hive internal and external tables and Apache Spark data frames, and compare the query performance in Apache Hive with that of Apache Spark. Another contribution made is a case study in which a novel solution is proposed for data analysis in the performance management systems of mobile networks.

Journal Article

Share this book

Add to My Shelf

Moriarty: Improving ‘Time To Market’ in big data and Artificial intelligence applications

by Peña, P. , Calvo, J.I. , Rodrigálvarez, V. in Algorithms , Analytics , Artificial intelligence

2016

The objective of this paper is to present the Moriarty framework and show one use case of the recommenda tion of entertainment events. Moriarty is a tool that can generate Big Data near real-time analytics solutions (Streaming Analytics). This new tool makes possible the collaboration among the data scientist and the software engineer. Through Moriarty, they join forces for the rapid generation of new software solutions. The data scientist works with algorithms and data transformations using a visual interface, while the software engineer works with the idea of services to be invoked. The underlying idea is that a user can build projects of Artificial Intelligence and Data Analytics without having to make any line of code. The main power of the tool is to reduce the 'time to market' in an application which embeds complex algorithms of Artificial Intelligence. It is based on different Artificial Intelligence algorithms (like Deep Learning, Natural Language Processing and Semantic Web) and Big Data modules (Spark as a distributed data engine and access to NoSQL databases). Moriarty is divided into several layers; its core is a BPMN engine, which executes the processing and defines data analytics process, called workflows. Each workflow is defined by the standard BPMN model and is linked to a set of reusable functions or Artificial Intelligence algorithms written following a service-oriented architecture. An example of service presented is a recommendation application of restaurants, concerts, entertainment and events in general, where information is collected from social networks and websites, is processed by Natural Language Processing algorithms and finally introduced into a graph database.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter