Catalogue Search | MBRL

Accelerating discovery : mining unstructured information for hypothesis generation

by Spangler, Scott, author in Data mining. , Science Information resources. , Science Methodology.

Book

Share this book

Add to My Shelf

Adaptations of data mining methodologies: a systematic literature review

by Milani, Fredrik , Plotnikova, Veronika , Dumas, Marlon in Algorithms , Analysis , Artificial intelligence

2020

The use of end-to-end data mining methodologies such as CRISP-DM, KDD process, and SEMMA has grown substantially over the past decade. However, little is known as to how these methodologies are used in practice. In particular, the question of whether data mining methodologies are used ‘as-is’ or adapted for specific purposes, has not been thoroughly investigated. This article addresses this gap via a systematic literature review focused on the context in which data mining methodologies are used and the adaptations they undergo. The literature review covers 207 peer-reviewed and ‘grey’ publications. We find that data mining methodologies are primarily applied ‘as-is’. At the same time, we also identify various adaptations of data mining methodologies and we note that their number is growing rapidly. The dominant adaptations pattern is related to methodology adjustments at a granular level (modifications) followed by extensions of existing methodologies with additional elements. Further, we identify two recurrent purposes for adaptation: (1) adaptations to handle Big Data technologies, tools and environments (technological adaptations); and (2) adaptations for context-awareness and for integrating data mining solutions into business processes and IT systems (organizational adaptations). The study suggests that standard data mining methodologies do not pay sufficient attention to deployment issues, which play a prominent role when turning data mining models into software products that are integrated into the IT architectures and business processes of organizations. We conclude that refinements of existing methodologies aimed at combining data, technological, and organizational aspects, could help to mitigate these gaps.

Journal Article

Share this book

Add to My Shelf

Big data concepts, technologies, and applications

by Husain, Mohammad Shahid, 1981- author , Khan, Mohammad Zunnun, 1988- author , Siddiqui, Tamanna, author in Big data. , Data mining Methodology. , Données volumineuses.

2024

BOOK

Share this book

Add to My Shelf

Methodologies of Knowledge Discovery from Data and Data Mining Methods in Mechanical Engineering

by Rogalewicz, Michał , Sika, Robert in Data Mining methodology , Data Mining methods , knowledge discovery

2016

The paper contains a review of methodologies of a process of knowledge discovery from data and methods of data exploration (Data Mining), which are the most frequently used in mechanical engineering. The methodologies contain various scenarios of data exploring, while DM methods are used in their scope. The paper shows premises for use of DM methods in industry, as well as their advantages and disadvantages. Development of methodologies of knowledge discovery from data is also presented, along with a classification of the most widespread Data Mining methods, divided by type of realized tasks. The paper is summarized by presentation of selected Data Mining applications in mechanical engineering.

Journal Article

Share this book

Add to My Shelf

Learn data analysis with Python : lessons in coding

by Henley, A. J., author , Wolf, Dave (Software developer), author in Python (Computer program language) , Programming languages (Electronic computers) , Data mining.

\"Get started using Python in data analysis with this compact practical guide. This book includes three exercises and a case study on getting data in and out of Python code in the right format. Learn Data Analysis with Python also helps you discover meaning in the data using analysis and shows you how to visualize it. Each lesson is, as much as possible, self-contained to allow you to dip in and out of the examples as your needs dictate. If you are already using Python for data analysis, you will find a number of things that you wish you knew how to do in Python. You can then take these techniques and apply them directly to your own projects. If you aren't using Python for data analysis, this book takes you through the basics at the beginning to give you a solid foundation in the topic. As you work your way through the book you will have a better of idea of how to use Python for data analysis when you are finished. You will: get data into and out of Python code, prepare the data and its format, find the meaning of the data, visualize the data using iPython.\"--Provided by publisher.

Book

Share this book

Add to My Shelf

Predictive models in Alzheimer's disease: an evaluation based on data mining techniques

by Yactayo-Arias, Cesar , Andrade-Arenas, Laberiano , Rubio-Paucar, Inoc

2024

The increasing prevalence of Alzheimer's disease in older adults has raised significant concern in recent years. Aware of this challenge, this research set out to develop predictive models that allow early identification of people at risk for Alzheimer's disease, considering several variables associated with the disease. To achieve this objective, data mining techniques were employed, specifically the decision tree algorithm, using the RapidMiner Studio tool. The sample explore modify model and assess (SEMMA) methodology was implemented systematically at each stage of model development, ensuring an orderly and structured approach. The results obtained revealed that 45.00% of people with dementia present characteristics that identify them as candidates for confirmation of a diagnosis of Alzheimer's disease. In contrast, 52.78% of those who do not have dementia show no danger of contracting the disease. In the conclusion of the research, it was noted that most patients diagnosed with Alzheimer's are older than 65 years, indicating that this stage of life tends to trigger brain changes associated with the disease. This finding underscores the importance of considering age as a key factor in the early identification of the disease.

Journal Article

Share this book

Add to My Shelf

Data analysis and visualization using Python: analyze data to create visualizations for BI systems

by Embarak, Ossama, author in Python (Computer program language) , Programming languages (Electronic computers) , Data mining.

Look at Python from a data science point of view and learn proven techniques for data visualization as used in making critical business decisions. Starting with an introduction to data science with Python, you will take a closer look at the Python environment and get acquainted with editors such as Jupyter Notebook and Spyder. After going through a primer on Python programming, you will grasp fundamental Python programming techniques used in data science. Moving on to data visualization, you will see how it caters to modern business needs and forms a key factor in decision-making. You will also take a look at some popular data visualization libraries in Python. Shifting focus to data structures, you will learn the various aspects of data structures from a data science perspective. You will then work with file I/O and regular expressions in Python, followed by gathering and cleaning data. Moving on to exploring and analyzing data, you will look at advanced data structures in Python. Then, you will take a deep dive into data visualization techniques, going through a number of plotting systems in Python. In conclusion, you will complete a detailed case study, where you'll get a chance to revisit the concepts you've covered so far. What You Will Learn Use Python programming techniques for data science Master data collections in Python Create engaging visualizations for BI systems Deploy effective strategies for gathering and cleaning data Integrate the Seaborn and Matplotlib plotting systems Who This Book Is For Developers with basic Python programming knowledge looking to adopt key strategies for data analysis and visualizations using Python.--Provided by publisher.

Book

Share this book

Add to My Shelf

An overview of data mining

by Chorianopoulos, Antonios in customer “signature” data , data mining applications , data mining methodology overview

2015

This chapter presents a brief overview of the main concepts of data mining. It introduces the main marketing applications supported by data mining models. It explains the main types of data mining models and algorithms. It also introduces a process model, a methodological framework for designing and implementing successful data mining projects.

Book Chapter

Share this book

Add to My Shelf

Subjective well-being and social media

by Iacus, Stefano M. (Stefano Maria), author , Porro, Giuseppe, author in Online social networks Use studies. , Online social networks Research Data processing. , Online social networks Psychological aspects.

\"Subjective Well-Being and Social Media shows how, by exploiting the unprecedented amount of information provided by the social networking sites, it is possible to build new composite indicators of subjective well-being. These new social media indicators are complementary to official statistics and surveys, whose data are collected at very low temporary and geographical resolution. The book also explains in full details how to solve the problem of selection bias coming from social media data. Mixing textual analysis, machine learning and time series analysis, the book also shows how to extract both the structural and the temporary components of subjective well-being. Cross-country analysis confirms that well-being is a complex phenomenon that is governed by macroeconomic and health factors, ageing, temporary shocks and cultural and psychological aspects. As an example, the last part of the book focuses on the impact of the prolonged stress due to the COVID-19 pandemic on subjective well-being in both Japan and Italy. Through a data science approach, the results show that a consistent and persistent drop occurred throughout 2020 in the overall level of well-being in both countries. The methodology presented in this book: enables social scientists and policy makers to know what people think about the quality of their own life, minimizing the bias induced by the interaction between the researcher and the observed individuals; being language-free, it allows for comparing the well-being perceived in different linguistic and socio-cultural contexts, disentangling differences due to objective events and life conditions from dissimilarities related to social norms or language specificities; provides a solution to the problem of selection bias in social media data through a systematic approach based on time-space small area estimation models. The book comes also with replication R scripts and data. Stefano M. Iacus is full professor of Statistics at the University of Milan, on leave at the Joint Research Centre of the European Commission. Former R-core member (1999-2017) and R Foundation Member. Giuseppe Porro is full professor of Economic Policy at the University of Insubria. An earlier version of this project was awarded the Italian Institute of Statistics-Google prize for \"official statistics and big data\"\"-- Provided by publisher.

Book

Share this book

Add to My Shelf

Preparing to Model the Data

by Larose, Daniel T , Larose, Chantal D in bias‐variance trade‐off , data mining methodology , statistical methodology

2014

Data mining methods may be categorized as either supervised or unsupervised. In unsupervised methods, no target variable is identified as such. Instead, the data mining algorithm searches for patterns and structure among all the variables. Most data mining methods are supervised methods. Statistical methodology and data mining methodology differ in two ways: 1) Applying statistical inference using the huge sample sizes encountered in data mining tends to result in statistical significance, even when the results are not of practical significance. 2) In statistical methodology, the data analyst has an a priori hypothesis in mind. Data mining procedures usually do not have an a priori hypothesis, instead freely trolling through the data for actionable results. The bias–variance trade‐off is another way of describing the overfitting/underfitting dilemma.

Book Chapter

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter