Catalogue Search | MBRL

A comparison of statistical methods for modeling count data with an application to hospital length of stay

by Fernandez, Gustavo A. , Vatcheva, Kristina P. in Coronaviruses , Count data , COVID-19

2022

Background Hospital length of stay (LOS) is a key indicator of hospital care management efficiency, cost of care, and hospital planning. Hospital LOS is often used as a measure of a post-medical procedure outcome, as a guide to the benefit of a treatment of interest, or as an important risk factor for adverse events. Therefore, understanding hospital LOS variability is always an important healthcare focus. Hospital LOS data can be treated as count data, with discrete and non-negative values, typically right skewed, and often exhibiting excessive zeros. In this study, we compared the performance of the Poisson, negative binomial (NB), zero-inflated Poisson (ZIP), and zero-inflated negative binomial (ZINB) regression models using simulated and empirical data. Methods Data were generated under different simulation scenarios with varying sample sizes, proportions of zeros, and levels of overdispersion. Analysis of hospital LOS was conducted using empirical data from the Medical Information Mart for Intensive Care database. Results Results showed that Poisson and ZIP models performed poorly in overdispersed data. ZIP outperformed the rest of the regression models when the overdispersion is due to zero-inflation only. NB and ZINB regression models faced substantial convergence issues when incorrectly used to model equidispersed data. NB model provided the best fit in overdispersed data and outperformed the ZINB model in many simulation scenarios with combinations of zero-inflation and overdispersion, regardless of the sample size. In the empirical data analysis, we demonstrated that fitting incorrect models to overdispersed data leaded to incorrect regression coefficients estimates and overstated significance of some of the predictors. Conclusions Based on this study, we recommend to the researchers that they consider the ZIP models for count data with zero-inflation only and NB models for overdispersed data or data with combinations of zero-inflation and overdispersion. If the researcher believes there are two different data generating mechanisms producing zeros, then the ZINB regression model may provide greater flexibility when modeling the zero-inflation and overdispersion.

Journal Article

Share this book

Add to My Shelf

Early warning and predicting of COVID-19 using zero-inflated negative binomial regression model and negative binomial regression model

by Wang, Xiaomin , Huang, Daizheng , Huang, Tengda in Analysis , Bacterial pneumonia , Baidu search index

2024

Background It is difficult to detect the outbreak of emergency infectious disease based on the exiting surveillance system. Here we investigate the utility of the Baidu Search Index, an indicator of how large of a keyword is in Baidu’s search volume, in the early warning and predicting the epidemic trend of COVID-19. Methods The daily number of cases and the Baidu Search Index of 8 keywords (weighted by population) from December 1, 2019 to March 15, 2020 were collected and analyzed with times series and Spearman correlation with different time lag. To predict the daily number of COVID-19 cases using the Baidu Search Index, Zero-inflated negative binomial regression was used in phase 1 and negative binomial regression model was used in phase 2 and phase 3 based on the characteristic of independent variable. Results The Baidu Search Index of all keywords in Wuhan was significantly higher than Hubei (excluded Wuhan) and China (excluded Hubei). Before the causative pathogen was identified, the search volume of “Influenza” and “Pneumonia” in Wuhan increased with the number of new onset cases, their correlation coefficient was 0.69 and 0.59, respectively. After the pathogen was public but before COVID-19 was classified as a notifiable disease, the search volume of “SARS”, “Pneumonia”, “Coronavirus” in all study areas increased with the number of new onset cases with the correlation coefficient was 0.69 ~ 0.89, while “Influenza” changed to negative correlated (r s : -0.56 ~ -0.64). After COVID-19 was closely monitored, the Baidu Search Index of “COVID-19”, “Pneumonia”, “Coronavirus”, “SARS” and “Mask” could predict the epidemic trend with 15 days, 5 days and 6 days lead time, respectively in Wuhan, Hubei (excluded Wuhan) and China (excluded Hubei). The predicted number of cases would increase 1.84 and 4.81 folds, respectively than the actual number of cases in Wuhan and Hubei (excluded Wuhan) from 21 January to 9 February. Conclusion The Baidu Search Index could be used in the early warning and predicting the epidemic trend of COVID-19, but the search keywords changed in different period. Considering the time lag from onset to diagnosis, especially in the areas with medical resources shortage, internet search data can be a highly effective supplement of the existing surveillance system.

Journal Article

Share this book

Add to My Shelf

Natural regeneration following wind disturbance increases the diversity of managed lowland forests in NE Poland

by Zaremba, Jakub , Tomski, Andrzej , Szwagrzyk, Jerzy in Abundance , Betula pendula , Biodiversity

2018

Questions: Are there significant differences in the density and composition of natural regeneration among habitat types? Is the abundance of regeneration higher in patches more seriously damaged by a windstorm than in patches not affected by the wind? Is the species diversity of regeneration greater than the diversity of mature trees prior to disturbance? Location: Szast Protected Forest, NE Poland. Methods: Throughout the Szast P.F., 111 sample plots were distributed in a regular grid 13 years after a windstorm. In plots located in disturbed forests, we measured all the canopy trees and the 30 young individuals of each size class (seedlings, short saplings and tall saplings) closest to the plot centre. In non-disturbed patches, we measured all trees within a plot of predefined radius. For statistical analyses of differences in the diversity between mature stands and young generation, we used the Kruskal–Wallis test followed by Dunn's multiple comparison test. The relationship between the canopy layer and the young generation of trees was analysed using Spearman's rank correlation and a classic negative binomial regression. Results: Natural regeneration was more abundant in the coniferous and mixed coniferous habitat types than in the mixed deciduous type. The density of young trees was negatively correlated with the basal area of the trees that survived the windstorm and was positively correlated with canopy tree mortality. After the windstorm, Pinus sylvestris lost more trees than the other species; Picea abies and Betula pendula slightly increased their share; and the species that benefited most from the disturbance was Quercus robur. In the coniferous habitat type, the species diversity of the young generation of trees was higher than the diversity of the canopy trees prior to the windstorm. Conclusions: Leaving wind-disturbed areas to natural regeneration could be a viable option for converting coniferous plantations into more diverse and species-rich stands.

Journal Article

Share this book

Add to My Shelf

Research Constituents, Intellectual Structure, and Collaboration Patterns in Journal of International Marketing

by Donthu, Naveen , Kumar, Satish , Pandey, Nitesh in Bibliometrics , Marketing , Qualitative research

2021

This study presents a retrospective on Journal of International Marketing using bibliometrics. The study finds that the journal’s run has been characterized by continuous growth in publications and citations, with a dominant contribution base of authors from the United States. Authors have consistently shown a strong preference for quantitative research, with a decline in preference for qualitative research and a negligible increase in preference for mixed-methods research in recent years. The major themes in the journal include global branding, internationalization, cross-cultural marketing, and international relationship marketing. An exploration of the factors affecting article citations reveals that article attributes such as the conceptual method, empirical method, article length, title length, article age, and number of keywords play significant roles in increasing the number of citations. Authors affiliated with nonacademic institutions also have a significant and positive influence on total citations. The article concludes with directions for further research.

Journal Article

Share this book

Add to My Shelf

Bayesian Inference for Logistic Models Using Pólya–Gamma Latent Variables

by Polson, Nicholas G. , Scott, James G. , Windle, Jesse in Approximation , Augmentation , Bayesian analysis

2013

We propose a new data-augmentation strategy for fully Bayesian inference in models with binomial likelihoods. The approach appeals to a new class of Pólya–Gamma distributions, which are constructed in detail. A variety of examples are presented to show the versatility of the method, including logistic regression, negative binomial regression, nonlinear mixed-effect models, and spatial models for count data. In each case, our data-augmentation strategy leads to simple, effective methods for posterior inference that (1) circumvent the need for analytic approximations, numerical integration, or Metropolis–Hastings; and (2) outperform other known data-augmentation strategies, both in ease of use and in computational efficiency. All methods, including an efficient sampler for the Pólya–Gamma distribution, are implemented in the R package BayesLogit . Supplementary materials for this article are available online.

Journal Article

Share this book

Add to My Shelf

Statistical Analysis of Truck Accidents for Divided Multilane Interurban Roads in Turkey

by Aytac, Bengi P. , Kibar, Funda Ture , Celik, Fazil in Accidents , Civil Engineering , Coefficients

2018

Freight transportation is an important factor in Turkish economic growth, and the high volume of truck traffic has increased traffic accidents on Turkish roads. However, to the best of our knowledge, no studies have investigated the factors that contribute to truck accidents. This study aims to reduce truck accident involvement and quantify the effect of variables on the occurrence of truck accidents on divided multilane interurban roads in Turkey. This study documents the performance of Poisson, Negative Binomial (NB), and Zero-inflated Negative Binomial Regression (ZINB) models to establish the relation between truck accidents and traffic and geometric road characteristics on a 282 km section of the Ankara–Aksaray–Eregli divided multilane interurban road. Model coefficients were estimated by the maximum likelihood method, and deviance and the Akaike information criterion were considered as goodness of fit statistics. The Vuong test was used to determine the appropriateness of using the ZINB model rather than the NB model. The results show that the NB model fitted the data very well. The proposed model for Turkish divided multilane interurban roads with a high percentage of truck traffic might be useful to detect critical factors and reduce truck accident involvement.

Journal Article

Share this book

Add to My Shelf

Too many zeros and/or highly skewed? A tutorial on modelling health behaviour as count data with Poisson and negative binomial regression

by Green, James A. in Advanced Methods in Health Psychology and Behavioral Medicine , Binomial distribution , Count data

2021

Dependent variables in health psychology are often counts, for example, of a behaviour or number of engagements with an intervention. These counts can be very strongly skewed, and/or contain large numbers of zeros as well as extreme outliers. For example, 'How many cigarettes do you smoke on an average day?' The modal answer may be zero but may range from 0 to 40+. The same can be true for minutes of moderate-to-vigorous physical activity. For some people, this may be near zero, but take on extreme values for someone training for a marathon. Typical analytical strategies for this data involve explicit (or implied) transformations (smoker v. non-smoker, log transformations). However, these data types are 'counts' (i.e. non-negative whole numbers) or quasi-counts (time is ratio but discrete minutes of activity could be analysed as a count), and can be modelled using count distributions - including the Poisson and negative binomial distribution (and their zero-inflated and hurdle extensions, which alloweven more zeros). In this tutorial paper I demonstrate (in R, Jamovi, and SPSS) the easy application of these models to health psychology data, and their advantages over alternative ways of analysing this type of data using two datasets - one highly dispersed dependent variable (number of views on YouTube, and another with a large number of zeros (number of days on which symptoms were reported over a month). The negative binomial distribution had the best fit for the overdispersed number of views on YouTube. Negative binomial, and zero-inflated negative binomial were both good fits for the symptom data with over-abundant zeros. In both cases, count distributions provided not just a better fit but would lead to different conclusions compared to the poorly fitting traditional regression/linear models.

Journal Article

Share this book

Add to My Shelf

Visualizing Count Data Regressions Using Rootograms

by Zeileis, Achim , Kleiber, Christian in Codes , Data , Ethology

2016

The rootogram is a graphical tool associated with the work of J. W. Tukey that was originally used for assessing goodness of fit of univariate distributions. Here, we extend the rootogram to regression models and show that this is particularly useful for diagnosing and treating issues such as overdispersion and/or excess zeros in count data models. We also introduce a weighted version of the rootogram that can be applied out of sample or to (weighted) subsets of the data, for example, in finite mixture models. An empirical illustration revisiting a well-known dataset from ethology is included, for which a negative binomial hurdle model is employed. Supplementary materials providing two further illustrations are available online: the first, using data from public health, employs a two-component finite mixture of negative binomial models; the second, using data from finance, involves underdispersion. An R implementation of our tools is available in the R package countreg . It also contains the data and replication code.

Journal Article

Share this book

Add to My Shelf

Interdisciplinarity and impact: the effects of the citation time window

by Chen, Shiji , Song, Yanhui , Larivière, Vincent in Biodiversity , Citation analysis , Citation indexes

2022

The relationship between interdisciplinarity and citation impact is affected by many factors, and the citation time window is a crucial factor. Our study examines the effect of the citation time window on the relationship between interdisciplinarity and scientific impact. All journal articles published in 2006 in Web of Science (WoS) are considered. The relationship between interdisciplinarity and scientific impact is explored by conducting a year-by-year negative binomial regression analysis with different interdisciplinarity indicators. Three diversity single-property indicators (namely variety, balance, and disparity) and three typical composite interdisciplinarity indicators (Rao-Stirling index (RS), Leinster–Cobbold diversity indices (LCDiv), and DIV) are used in this study. The results show that evaluating the scientific impact of interdisciplinarity requires a sufficiently long citation time window. However, the length of the citation time window is different for different interdisciplinarity indicators. A 4-year citation time window is necessary when the variety indicator is used, whereas balance and disparity require at least 11-year and 13-year citation time windows, respectively. The citation time window is the same (at least 5 years) for the three composite interdisciplinarity indicators (RS, LCDiv, and DIV). The recommended length of the citation time window is based only on this study and may be affected by the data set, regression model, and discipline classification system.

Journal Article

Share this book

Add to My Shelf

Estimating the Threshold Effects of Climate on Dengue: A Case Study of Taiwan

by Tran, Bao-Linh , Tseng, Wei-Chun , Chen, Chi-Chung

2020

Climate change is regarded as one of the major factors enhancing the transmission intensity of dengue fever. In this study, we estimated the threshold effects of temperature on Aedes mosquito larval index as an early warning tool for dengue prevention. We also investigated the relationship between dengue vector index and dengue epidemics in Taiwan using weekly panel data for 17 counties from January 2012 to May 2019. To achieve our goals, we first applied the panel threshold regression technique to test for threshold effects and determine critical temperature values. Data were then further decomposed into different sets corresponding to different temperature regimes. Finally, negative binomial regression models were applied to assess the non-linear relationship between meteorological factors and Breteau index (BI). At the national level, we found that a 1°C temperature increase caused the expected value of BI to increase by 0.09 units when the temperature is less than 27.21 °C, and by 0.26 units when the temperature is greater than 27.21 °C. At the regional level, the dengue vector index was more sensitive to temperature changes because double threshold effects were found in the southern Taiwan model. For southern Taiwan, as the temperature increased by 1°C, the expected value of BI increased by 0.29, 0.63, and 1.49 units when the average temperature was less than 27.27 °C, between 27.27 and 30.17 °C, and higher than 30.17 °C, respectively. In addition, the effects of precipitation and relative humidity on BI became stronger when the average temperature exceeded the thresholds. Regarding the impacts of climate change on BI, our results showed that the potential effects on BI range from 3.5 to 54.42% under alternative temperature scenarios. By combining threshold regression techniques with count data regression models, this study provides evidence of threshold effects between climate factors and the dengue vector index. The proposed threshold of temperature could be incorporated into the implementation of public health measures and risk prediction to prevent and control dengue fever in the future.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter