Catalogue Search | MBRL

A Model Selection Approach for Time Series Forecasting: Incorporating Google Trends Data in Australian Macro Indicators

by Pardede, Eric , Mann, Scott , Karim, Ali Abdul in Accuracy , Artificial neural networks , Australia

2023

This study examined whether the behaviour of Internet search users obtained from Google Trends contributes to the forecasting of two Australian macroeconomic indicators: monthly unemployment rate and monthly number of short-term visitors. We assessed the performance of traditional time series linear regression (SARIMA) against a widely used machine learning technique (support vector regression) and a deep learning technique (convolutional neural network) in forecasting both indicators across different data settings. Our study focused on the out-of-sample forecasting performance of the SARIMA, SVR, and CNN models and forecasting the two Australian indicators. We adopted a multi-step approach to compare the performance of the models built over different forecasting horizons and assessed the impact of incorporating Google Trends data in the modelling process. Our approach supports a data-driven framework, which reduces the number of features prior to selecting the best-performing model. The experiments showed that incorporating Internet search data in the forecasting models improved the forecasting accuracy and that the results were dependent on the forecasting horizon, as well as the technique. To the best of our knowledge, this study is the first to assess the usefulness of Google search data in the context of these two economic variables. An extensive comparison of the performance of traditional and machine learning techniques on different data settings was conducted to enable the selection of an efficient model, including the forecasting technique, horizon, and modelling features.

Journal Article

Share this book

Add to My Shelf

Heat alerts and information-seeking behavior: evidence from heat-related internet searches in the United States

by Adams, Quinn H , Wellenius, Gregory A , Milando, Chad W in Air conditioning , Cooling , Data search

2025

Extreme heat is a growing public health threat. In the United States (US), the National Weather Service (NWS) issues alerts ahead of forecast periods of extreme heat. However, the behavioral impact of official heat alerts remains poorly understood at actionable scales, and their real-world effectiveness has been difficult to quantify. We used anonymized, county-level Google search data on the proportion of searches classified into eight heat-associated categories, aggregated daily across 2581 US counties from May to September 2023. We implemented a time-stratified case-crossover method using conditional Poisson models to quantify the association between the proportion of heat-related internet searches and (a) daily maximum temperature and (b) NWS heat alerts, adjusting for same-day county-specific temperature percentiles. We further evaluated how associations varied spatially and temporally within the season. Across all counties, searches for heat stroke/exhaustion were 3.60 (95% CI, 3.38–3.85) times higher when comparing the 95th percentile of daily maximum temperatures to the 1st percentile. Air-conditioning searches were 2.47 (2.43–2.51) times higher. Exposure–response curves rose steeply above the 80th percentile except for public swimming and cooling center queries. On heat alert days, heat-related searches were 1.27 (1.26, 1.28) times higher relative to matched non-alert days. Results varied by region. However, effect modification was pronounced: early-summer alerts (May–June) elicited stronger responses than late-summer alerts (July–September) in all heat-related search categories except cooling centers. Our findings suggest that heat alerts trigger meaningful, real-time behavioral responses during heatwaves, particularly in early summer and historically cooler regions. High-resolution internet search data offer a promising tool for evaluating public engagement with risk communication and can provide local officials guidance for further optimizing messaging strategies.

Journal Article

Share this book

Add to My Shelf

Assessing Spurious Correlations in Big Search Data

by Roberts, Ryan J. , Richman, Jesse T. in Big Data , big search data , Bonferroni

2023

Big search data offers the opportunity to identify new and potentially real-time measures and predictors of important political, geographic, social, cultural, economic, and epidemiological phenomena, measures that might serve an important role as leading indicators in forecasts and nowcasts. However, it also presents vast new risks that scientists or the public will identify meaningless and totally spurious ‘relationships’ between variables. This study is the first to quantify that risk in the context of search data. We find that spurious correlations arise at exceptionally high frequencies among probability distributions examined for random variables based upon gamma (1, 1) and Gaussian random walk distributions. Quantifying these spurious correlations and their likely magnitude for various distributions has value for several reasons. First, analysts can make progress toward accurate inference. Second, they can avoid unwarranted credulity. Third, they can demand appropriate disclosure from the study authors.

Journal Article

Share this book

Add to My Shelf

An interval-valued carbon price forecasting method based on web search data and social media sentiment

by Liu, Jinpei , Wang, Piao , Chen, Huayou in Accuracy , algorithms , Aquatic Pollution

2023

Accurate carbon price prediction is a crucial task for the carbon trading market. Previous studies have ignored the impact of online data and are limited to point predictions, which brings challenges to the accurate forecasting of carbon prices. To address those issues, this paper proposes an interval-valued carbon price forecasting method based on web search data and social media sentiment. First, we collect web search data and social media sentiment to improve prediction performance by synthesizing multiple types of data information. Second, we employ principal component analysis (PCA) to preprocess high-dimensional web search data, and utilize BosonNLP for quantifying social media information, thereby enhancing the predictability of the dataset. Subsequently, a variational mode decomposition (VMD) is applied to the carbon price and online data, followed by utilizing particle swarm optimization support vector regression (PSO-SVR) to predict each sub-modes and summing them up to obtain the ultimate forecasting outcome. Finally, using carbon prices in Guangdong and Hubei provinces as case studies, the experimental results demonstrate that web search data and social media sentiment significantly enhance the predictive accuracy of interval-valued carbon prices. Furthermore, the proposed VMD-PSO-SVR outperforms other comparative models in the accuracy and reliability of interval-valued forecasting.

Journal Article

Share this book

Add to My Shelf

Evolutionary computation for solving search-based data analytics problems

in Algorithms , Big Data , Bioinformatics

2021

Automatic extracting of knowledge from massive data samples, i.e., big data analytics (BDA), has emerged as a vital task in almost all scientific research fields. The BDA problems are rather difficult to solve due to their large-scale, high-dimensional, and dynamic properties, while the problems with small data are usually hard to handle due to insufficient data samples and incomplete information. Such difficulties lead to the search-based data analytics problem, where a data analysis task is modeled as a complex, dynamic, and computationally expensive optimization problem and then solved by using an iterative algorithm. In this paper, we intend to present an extensive and in-depth discussion on the utilizing of evolutionary computation (EC) based optimization methods [including evolutionary algorithms (EAs) and swarm intelligence (SI)] for solving search-based data analysis problems. Then, as an example for illustration, we provide a comprehensive review of the applications of state-of-the-art EC methods for different types of data mining problems in bioinformatics. Here, the detailed analysis and discussion are conducted on three types of data samples, which include sequences data, network data, and image data. Finally, we survey the challenges faced by EC methods and the trend for future directions. Based on the applications of EC methods for search-based data analysis problems involving inexact and uncertain information, the insights of data analytics are able to understand better, and more efficient algorithms could be designed to solve real-world complex BDA problems.

Journal Article

Share this book

Add to My Shelf

Low validity of Google Trends for behavioral forecasting of national suicide rates

by Voracek, Martin , Andel, Rita , Till, Benedikt in Analysis , Austria , Computer and Information Sciences

2017

Recent research suggests that search volumes of the most popular search engine worldwide, Google, provided via Google Trends, could be associated with national suicide rates in the USA, UK, and some Asian countries. However, search volumes have mostly been studied in an ad hoc fashion, without controls for spurious associations. This study evaluated the validity and utility of Google Trends search volumes for behavioral forecasting of suicide rates in the USA, Germany, Austria, and Switzerland. Suicide-related search terms were systematically collected and respective Google Trends search volumes evaluated for availability. Time spans covered 2004 to 2010 (USA, Switzerland) and 2004 to 2012 (Germany, Austria). Temporal associations of search volumes and suicide rates were investigated with time-series analyses that rigorously controlled for spurious associations. The number and reliability of analyzable search volume data increased with country size. Search volumes showed various temporal associations with suicide rates. However, associations differed both across and within countries and mostly followed no discernable patterns. The total number of significant associations roughly matched the number of expected Type I errors. These results suggest that the validity of Google Trends search volumes for behavioral forecasting of national suicide rates is low. The utility and validity of search volumes for the forecasting of suicide rates depend on two key assumptions (\"the population that conducts searches consists mostly of individuals with suicidal ideation\", \"suicide-related search behavior is strongly linked with suicidal behavior\"). We discuss strands of evidence that these two assumptions are likely not met. Implications for future research with Google Trends in the context of suicide research are also discussed.

Journal Article

Share this book

Add to My Shelf

The Power of Travel Search Data in Forecasting the Tourism Demand in Dubai

by Rashad, Ahmed Shoukry in Analysis , Artificial intelligence , big data

2022

Tourism plays an important economic role for many economies and after the COVID-19 pandemic, accurate tourism forecasting become critical for policymakers in tourism-dependent economies. This paper extends the growing literature on the use of internet search data in tourism forecasting through evaluating the predictive ability of Destination Insight with Google, a new Google product designed to monitor tourism recovery after the COVID-19 pandemic. This paper is the first attempt to explore the forecasting ability of the new Google data. The study focuses on the case of Dubai, given its status as a world-leading tourism destination. The study uses time series models that account for seasonality, trending variables, and structural breaks. The study uses monthly data for the period of January 2019 to April 2022. We explore whether the internet travel search queries can improve the forecasting of tourist arrivals to Dubai from the UK. We evaluate the accuracy of forecasts after incorporating the Google variable in our model. Our findings suggest that the new Google data can significantly improve tourism forecasting and serves as a leading indicator of tourism demand.

Journal Article

Share this book

Add to My Shelf

Enabling Comparable Search Over Encrypted Data for IoT with Privacy-Preserving

by Xu, Lei , Liu, Zhongyi , Wang, Yunling in Cloud computing , Confidentiality , Data search

2019

With the rapid development of cloud computing and Internet of Things (IoT) technology, massive data raises and shuttles on the network every day. To ensure the confidentiality and utilization of these data, industries and companies users encrypt their data and store them in an outsourced party. However, simple adoption of encryption scheme makes the original lose its flexibility and utilization. To address these problems, the searchable encryption scheme is proposed. Different from traditional encrypted data search scheme, this paper focuses on providing a solution to search the data from one or more IoT device by comparing their underlying numerical values. We present a multi-client comparable search scheme over encrypted numerical data which supports range queries. This scheme is mainly designed for keeping the confidentiality and searchability of numeric data, it enables authorized clients to fetch the data from different data owners by a generated token. Furthermore, to rich the scheme's functionality, we exploit the idea of secret sharing to realize cross-domain search which improves the data's utilization. The proposed scheme has also been proven to be secure through a series of security games. Moreover, we conduct experiments to demonstrate that our scheme is more practical than the existed similar schemes and achieves a balance between functionality and efficiency.

Journal Article

Share this book

Add to My Shelf

Identifying the geographic leading edge of Lyme disease in the United States with internet searches: A spatiotemporal analysis of Google Health Trends data

by Aucott, John N. , Curriero, Frank C. , Rebman, Alison W. in Analysis , Arachnids , Biology and Life Sciences

2024

The geographic footprint of Lyme disease is expanding in the United States, which calls for novel methods to identify emerging endemic areas. The ubiquity of internet use coupled with the dominance of Google's search engine makes Google user search data a compelling data source for epidemiological research. We evaluated the potential of Google Health Trends to track spatiotemporal patterns in Lyme disease and identify the leading edge of disease risk in the United States. We analyzed internet search rates for Lyme disease-related queries at the designated market area (DMA) level (n = 206) for the 2011-2019 and 2020-2021 (COVID-19 pandemic) periods. We used maps and other exploratory methods to characterize changes in search behavior. To assess statistical correlation between searches and Lyme disease cases reported to Centers for Disease Control and Prevention (CDC) between 2011 and 2019, we performed a longitudinal ecological analysis with modified Poisson generalized estimating equation regression models. Mapping DMA-level changes in \"Lyme disease\" search rates revealed an expanding area of higher rates occurring along the edges of the northeastern focus of Lyme disease. Bivariate maps comparing search rates and CDC-reported incidence rates also showed a stronger than expected signal from Google Health Trends in some high-risk adjacent states such as Michigan, North Carolina, and Ohio, which may be further indication of a geographic leading edge of Lyme disease that is not fully apparent from routine surveillance. Searches for \"Lyme disease\" were a significant predictor of CDC-reported disease incidence. Each 100-unit increase in the search rate was significantly associated with a 10% increase in incidence rates (RR = 1.10, 95% CI: 1.07, 1.12) after adjusting for environmental covariates of Lyme disease identified in the literature. Google Health Trends data may help track the expansion of Lyme disease and inform the public and health care providers about emerging risks in their areas.

Journal Article

Share this book

Add to My Shelf

Intelligence in Tourism Management: A Hybrid FOA-BP Method on Daily Tourism Demand Forecasting with Web Search Data

by Lu, Wenxing , Liang, Changyong , Wang, Binyou in Algorithms , Back propagation networks , Data search

2019

The Chinese tourism industry has been developing rapidly for the past several years, and the number of people traveling has been increasing year by year. However, many problems still beset current tourism management. Lack of effective management has caused numerous problems, such as tourists stranded during tourist season and the declining service quality of scenic spots, which have become the focus of tourists’ attention. Network search data can intuitively reflect the attention of most users through the combination of the network search index and the back propagation (BP) neural network model. This study predicts the daily tourism demand in the Huangshan scenic spot in China. The filtered keyword in the Baidu index is added to the hybrid neural network, and a BP neural network model optimized by a fruit fly optimization algorithm (FOA) based on the web search data is established in this study. Different forecasting methods are compared in this paper; the results prove that compared with other prediction models, higher accuracy can be obtained when it comes to the peak season using the FOA-BP method that includes web search data, which is a sustainable means of practically solving the tourism management problem by a more accurate prediction of tourism demand of scenic spots.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter