Catalogue Search | MBRL

by Farcomeni, Alessio , Greco, Luca in Computer programs , Data reduction , Dimension reduction (Statistics)

2016,2015

This book gives a non-technical overview of robust data reduction techniques, encouraging the use of these important and useful methods in practical applications. The main areas covered include principal components analysis, sparse principal component analysis, canonical correlation analysis, factor analysis, clustering, double clustering, and discriminant analysis. Using real examples, the authors show how to implement the procedures in R. The code and data for the examples are available on the book's CRC Press web page.

eBook

Share this book

Add to My Shelf

A Linked Data Application for Harmonizing Heterogeneous Biomedical Information

by Capuano, Nicola , Ritrovato, Pierluigi , Foggia, Pasquale in biomedical ontologies , Cancer , Data warehouses

2022

In the biomedical field, there is an ever-increasing number of large, fragmented, and isolated data sources stored in databases and ontologies that use heterogeneous formats and poorly integrated schemes. Researchers and healthcare professionals find it extremely difficult to master this huge amount of data and extract relevant information. In this work, we propose a linked data approach, based on multilayer networks and semantic Web standards, capable of integrating and harmonizing several biomedical datasets with different schemas and semi-structured data through a multi-model database providing polyglot persistence. The domain chosen concerns the analysis and aggregation of available data on neuroendocrine neoplasms (NENs), a relatively rare type of neoplasm. Integrated information includes twelve public datasets available in heterogeneous schemas and formats including RDF, CSV, TSV, SQL, OWL, and OBO. The proposed integrated model consists of six interconnected layers representing, respectively, information on the disease, the related phenotypic alterations, the affected genes, the related biological processes, molecular functions, the involved human tissues, and drugs and compounds that show documented interactions with them. The defined scheme extends an existing three-layer model covering a subset of the mentioned aspects. A client–server application was also developed to browse and search for information on the integrated model. The main challenges of this work concern the complexity of the biomedical domain, the syntactic and semantic heterogeneity of the datasets, and the organization of the integrated model. Unlike related works, multilayer networks have been adopted to organize the model in a manageable and stratified structure, without the need to change the original datasets but by transforming their data “on the fly” to respond to user requests.

Journal Article

Share this book

Add to My Shelf

Robust Fitting of a Wrapped Normal Model to Multivariate Circular Data and Outlier Detection

by Saraceno, Giovanni , Greco, Luca , Agostinelli, Claudio in Algorithms , classification , mahalanobis distance

2021

In this work, we deal with a robust fitting of a wrapped normal model to multivariate circular data. Robust estimation is supposed to mitigate the adverse effects of outliers on inference. Furthermore, the use of a proper robust method leads to the definition of effective outlier detection rules. Robust fitting is achieved by a suitable modification of a classification-expectation-maximization algorithm that has been developed to perform a maximum likelihood estimation of the parameters of a multivariate wrapped normal distribution. The modification concerns the use of complete-data estimating equations that involve a set of data dependent weights aimed to downweight the effect of possible outliers. Several robust techniques are considered to define weights. The finite sample behavior of the resulting proposed methods is investigated by some numerical studies and real data examples.

Journal Article

Share this book

Add to My Shelf

Spectral Characterization and Spatiotemporal Variability of the Background Seismic Noise in Italy

by D’ Alessandro, Antonino , Lauciani, Valentino , Scudero, Salvatore in ambient noise , Anthropogenic factors , background seismic noise

2021

In this study, we assess the spectral characteristics of seismic noise at the sites of the Italian Seismic Network and its spatio‐temporal variability. The evaluation of noise is crucial for the assessment of the detection capability of a seismic network. We selected a set of 233 stations, those equipped with broadband velocimeters (with corner period > 40 s) and operating continuously for at least four consecutive years. The analysis was carried out in the frequency band from 0.025 to 30 Hz, in accordance with the seismic sensors bandwidth. We estimated the Power Spectral Density (PSD) of the seismic noise for fixed temporal windows and then we calculated the Probability Density Functions (PDF) at each station. Exploiting the large data set available, we have been able to: (a) describe the characteristics of the noise power at each site; (b) investigate both temporal and spatial variations of the background noise, revealing correlations of the noise levels with natural and anthropogenic noise sources; (c) propose an empirical relationship linking the “microseismic” noise (i.e., 0.12–1.2 Hz) with the geographical features of the site hosting the seismic station; (d) establish the baselines of a new seismic noise model that could be considered as a new reference for the Italian territory. Key Points Background seismic noise from 233 Broad‐Band (BB) seismic stations has been analyzed by means of probability density function of power spectral density Temporal and spatial variations of noise are ascribable to local geological, environmental, and geographical features A noise model of background seismic noise is proposed as a reference for Italy

Journal Article

Share this book

Add to My Shelf

A weighted strategy to handle likelihood uncertainty in Bayesian inference

by Greco, Luca , Agostinelli, Claudio in Bayesian analysis , Distribution , Economic Theory/Quantitative Economics/Mathematical Methods

2013

The sensitivity of posterior inferences to model specification can be considered as an indicator of the presence of outliers, that are to be considered as highly unlikely values under the assumed model. The occurrence of anomalous values can seriously alter the shape of the likelihood function and lead to posterior distributions far from those one would obtain without these data inadequacies. In order to deal with these hindrances, a robust approach is discussed, which allows us to obtain outliers’ resistant posterior distributions with properties similar to those of a proper posterior distribution. The methodology is based on the replacement of the genuine likelihood by a weighted likelihood function in the Bayes’ formula.

Journal Article

Share this book

Add to My Shelf

Reversible Conversion Formulas Based on Partial Symmetric Linear Regression Models

by Greco, Luca , Luta, George in Random variables , Regression analysis , Regression models

2025

Symmetric regression deals with a reversible functional relationship involving a set of variables, where all of them are measured with error and it is not meaningful to consider one as the response and the remaining ones as explanatory. Therefore, it is unsuitable to study any functional (linear) relationship between them by fixing one direction of the regression rather than the other. The scope of the symmetric regression can be expanded by considering a partial symmetric linear regression where the functional relationship is controlled for other variables, which are not assumed to be error-prone. Actually, the word partial in this context, means that we are not interested in a fully symmetric relationship between all the variables but in a symmetric and reversible relationship that holds for some variables of interest, whose functional relationship is of primary concern, for any given value of the control variables. Therefore, a partial symmetric regression modeling strategy is developed within a very general framework that includes different symmetric regression strategies. The finite sample behaviors of the proposed estimators are investigated through numerical studies and illustrated with an application to rheumatology data to find a reversible conversion formula between the Stanford Health Assessment Questionnaire (HAQ) score and the Multi-Dimensional HAQ (MDHAQ) score.

Journal Article

Share this book

Add to My Shelf

On testing the equality between interquartile ranges

by Greco, Luca , Luta, George , Wilcox, Rand in Contingency tables , Equality , Numerical analysis

2024

The interquartile range is a statistical measure well suited to describe the variability of the data at hand, both at the population level and for sample data. The interquartile range is particularly useful when the distribution of the data is asymmetric or irregularly shaped. Here, the use of the interquartile range is investigated when the main aim is to compare the variability of two distributions using two independent random samples, without the need to make any distributional assumptions. Several techniques are compared through numerical studies and real data examples, with a particular attention given to the use of sample quantiles based on the Harrel-Davis estimator or the quantile regression.

Journal Article

Share this book

Add to My Shelf

Sentiment analysis for customer relationship management: an incremental learning approach

by Capuano, Nicola , Ritrovato Pierluigi , Vento, Mario in Algorithms , Brand loyalty , Classifiers

2021

In recent years there has been a significant rethinking of corporate management, which is increasingly based on customer orientation principles. As a matter of fact, customer relationship management processes and systems are ever more popular and crucial to facing today’s business challenges. However, the large number of available customer communication stimuli coming from different (direct and indirect) channels, require automatic language processing techniques to help filter and qualify such stimuli, determine priorities, facilitate the routing of requests and reduce the response times. In this scenario, sentiment analysis plays an important role in measuring customer satisfaction, tracking consumer opinion, interacting with consumers and building customer loyalty. The research described in this paper proposes an approach based on Hierarchical Attention Networks for detecting the sentiment polarity of customer communications. Unlike other existing approaches, after initial training, the defined model can improve over time during system operation using the feedback provided by CRM operators thanks to an integrated incremental learning mechanism. The paper also describes the developed prototype as well as the dataset used for training the model which includes over 30.000 annotated items. The results of two experiments aimed at measuring classifier performance and validating the retraining mechanism are also presented and discussed. In particular, the classifier accuracy turned out to be better than that of other algorithms for the supported languages (macro-averaged f1-score of 0.89 and 0.79 for Italian and English respectively) and the retraining mechanism was able to improve the classification accuracy on new samples without degrading the overall system performance.

Journal Article

Share this book

Add to My Shelf

Weighted likelihood estimation of multivariate location and scatter

by Greco, Luca , Agostinelli, Claudio in Asymptotic properties , Data analysis , Economic models

2019

A novel approach to obtain weighted likelihood estimates of multivariate location and scatter is discussed. A weighting scheme is proposed that is based on the univariate distribution of the Mahalanobis distances rather than the multivariate distribution of the data at the assumed model. This strategy allows to avoid the curse of dimensionality affecting multivariate non-parametric density estimation, that is involved in the construction of the weights through the Pearson residuals. Asymptotic properties of the proposed weighted likelihood estimator are also discussed. Then, weighted likelihood-based outlier detection rules and robust dimensionality reduction techniques are developed. The effectiveness of the methodology is illustrated through some numerical studies and real data examples.

Journal Article

Share this book

Add to My Shelf