Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
17
result(s) for
"分类算法"
Sort by:
基于C5.0算法的胃癌生存预测模型研究
2017
The incidence of gastric cancer is very high in China, and the number of new patients diagnosed with gastric cancer accounts for 42% of that of the whole world every year, so gastric cancer has become the focus of the prevention and control of malignant tumors in China. In this paper, the C5. 0 classification algorithm is used to predict the survival rate of gastrc cancer, and experimente are carried out using the SEER database of the American National Cancer Insti tute. The data preprocessing and data integration methods are given according to the unbalanced characteristics of gastric cancer record data. The prediction experimental results show that, the accuracy and specificity of C5. 0 algorithm are high compared with BP-neural network method; and there is an obvious correlation between birth place and surival state of gastric cancer patients. This study is a practical application of data mining technology in the field of medicine, which has certain reference value for the cliniccl diagnosis of gastric can
Journal Article
Statistics-based Optimization of the Polarimetric Radar Hydrometeor Classification Algorithm and Its Application for a Squall Line in South China
by
Chong WU;Liping LIU;Ming WEI;Baozhu XI;Minghui YU
in
Algorithms
,
Atmospheric Sciences
,
Calibration
2018
A modified hydrometeor classification algorithm (HCA) is developed in this study for Chinese polarimetric radars. This algorithm is based on the U.S. operational HCA. Meanwhile, the methodology of statistics-based optimization is proposed including calibration checking, datasets selection, membership functions modification, computation thresholds modification, and effect verification. Zhuhai radar, the first operational polarimetric radar in South China, applies these procedures. The systematic bias of calibration is corrected, the reliability of radar measurements deteriorates when the signal-to-noise ratio is low, and correlation coefficient within the melting layer is usually lower than that of the U.S. WSR-88D radar. Through modification based on statistical analysis of polarimetric variables, the localized HCA especially for Zhuhai is obtained, and it performs well over a one-month test through comparison with sounding and surface observations. The algorithm is then utilized for analysis of a squall line process on 11 May 2014 and is found to provide reasonable details with respect to horizontal and vertical structures, and the HCA results---especially in the mixed rain-hail region--can reflect the life cycle of the squall line. In addition, the kinematic and microphysical processes of cloud evolution and the differences between radar- detected hail and surface observations are also analyzed. The results of this study provide evidence for the improvement of this HCA developed specifically for China.
Journal Article
High-Impact Bug Report Identification with Imbalanced Learning Strategies
by
Xin-Li Yang David Lo Xin Xia Qiao Huang Jian-Ling Sun
in
Algorithms
,
Artificial Intelligence
,
Automation
2017
In practice, some bugs have more impact than others and thus deserve more immediate attention. Due to tight schedule and limited human resources, developers may not have enough time to inspect all bugs. Thus, they often concentrate on bugs that are highly impactful. In the literature, high-impact bugs are used to refer to the bugs which appear at unexpected time or locations and bring more unexpected effects (i.e., surprise bugs), or break pre-existing functionalities and destroy the user experience (i.e., breakage bugs). Unfortunately, identifying high-impact bugs from thousands of bug reports in a bug tracking system is not an easy feat. Thus, an automated technique that can identify high-impact bug reports can help developers to be aware of them early, rectify them quickly, and minimize the damages they cause. Considering that only a small proportion of bugs are high-impact bugs, the identification of high-impact bug reports is a difficult task. In this paper, we propose an approach to identify high-impact bug reports by leveraging imbalanced learning strategies. We investigate the effectiveness of various variants, each of which combines one particular imbalanced learning strategy and one particular classification algorithm. In particular, we choose four widely used strategies for dealing with imbalanced data and four state-of-the-art text classification algorithms to conduct experiments on four datasets from four different open source projects. We mainly perform an analytical study on two types of high-impact bugs, i.e., surprise bugs and breakage bugs. The results show that different variants have different performances, and the best performing variants SMOTE (synthetic minority over-sampling technique) + KNN (K-nearest neighbours) for surprise bug identification and RUS (random under-sampling) + NB (naive Bayes) for breakage bug identification outperform the Fl-scores of the two state-of-the-art approaches by Thung et al. and Garcia and Shihab.
Journal Article
Classification of Precipitation Types Using Fall Velocity–Diameter Relationships from 2D-Video Distrometer Measurements
by
Jeong-Eun LEE Sung-Hwa JUNG Hong-Mok PARK Soohyun KWON Pay-Liam LIN Gyu Won LEE
in
Algorithms
,
Atmospheric Sciences
,
Classification
2015
Fall velocity–diameter relationships for four different snowflake types(dendrite,plate,needle,and graupel) were investigated in northeastern South Korea,and a new algorithm for classifying hydrometeors is proposed for distrometric measurements based on the new relationships.Falling ice crystals(approximately 40 000 particles) were measured with a two-dimensional video disdrometer(2DVD) during a winter experiment from 15 January to 9 April 2010.The fall velocity–diameter relationships were derived for the four types of snowflakes based on manual classification by experts using snow photos and 2DVD measurements:the coefficients(exponents) for different snowflake types were 0.82(0.24) for dendrite,0.74(0.35) for plate,1.03(0.71) for needle,and 1.30(0.94) for graupel,respectively.These new relationships established in the present study(PS) were compared with those from two previous studies.Hydrometeor types were classified with the derived fall velocity–diameter relationships,and the classification algorithm was evaluated using 3 × 3 contingency tables for one rain–snow transition event and three snowfall events.The algorithm showed good performance for the transition event:the critical success indices(CSIs) were 0.89,0.61 and 0.71 for snow,wet-snow and rain,respectively.For snow events,the algorithm performance for dendrite and plate(CSIs = 1.0 and 1.0,respectively) was better than for needle and graupel(CSIs = 0.67 and 0.50,respectively).
Journal Article
Classifying Uncertain and Evolving Data Streams with Distributed Extreme Learning Machine
2015
Conventional classification algorithms are not well suited for the inherent uncertainty, potential concept drift, volume, and velocity of streaming data. Specialized algorithms are needed to obtain efficient and accurate classifiers for uncertain data streams. In this paper, we first introduce Distributed Extreme Learning Machine (DELM), an optimization of ELM for large matrix operations over large datasets. We then present Weighted Ensemble Classifier Based on Distributed ELM (WE-DELM), an online and one-pass algorithm for efficiently classifying uncertain streaming data with concept drift. A probability world model is built to transform uncertain streaming data into certain streaming data. Base classifiers are learned using DELM. The weights of the base classifiers are updated dynamically according to classification results. WE-DELM improves both the efficiency in learning the model and the accuracy in performing classification. Experimental results show that WE-DELM achieves better performance on different evaluation criteria, including efficiency, accuracy, and speedup.
Journal Article
Multi-algorithm and multi-model based drug target prediction and web server
by
Ying-tao LIU Yi LI Zi-fu HUANG Zhi-jian XU Zhuo YANG Zhu-xi CHEN Kai-xian CHEN Ji-ye SHI Wei-lia ng ZHU
in
Algorithms
,
Amino Acid Sequence
,
Biomedical and Life Sciences
2014
Aim: To develop a reliable computational approach for predicting potential drug targets based merely on protein sequence. Methods: With drug target and non-target datasets prepared and 3 classification algorithms (Support Vector Machine, Neural Network and Decision Tree), a multi-algorithm and multi-model based strategy was employed for constructing models to predict potential drug targets. Results: Twenty one prediction models for each of the 3 algorithms were successfully developed. Our evaluation results showed that --30% of human proteins were potential drug targets, and--40% of putative targets for the drugs undergoing phase II clinical trials were probably non-targets. A public web server named D3TPredictor (http://www.d3pharma.com/d3tpredictor) was constructed to provide easy access. Conclusion: Reliable and robust drug target prediction based on protein sequences is achieved using the multi-algorithm and multi- model strategy.
Journal Article
Misleading classification
In this paper, we investigate a new problem misleading classification in which each test instance is associated with an original class and a misleading class. Its goal for the data owner is to form the training set out of candidate instances such that the data miner will be misled to classify those test instances to their misleading classes rather than original classes. We discuss two cases of misleading classification. For the case where the classification algorithm is unknown to the data owner, a KNN based Ranking Algorithm (KRA) is proposed to rank all candidate instances based on the similarities between candidate instances and test instances. For the case where the classification algorithm is known, we propose a Greedy Ranking Algorithm (GRA) which evaluates each candidate instance by building up a classifier to predict the test set. In addition, we also show how to accelerate GRA in an incremental way when naive Bayes is employed as the classification algorithm. Experiments on 16 UCI data sets indicated that the ranked candidate instances by KRA can achieve promising leaking and misleading rates. When the classification algorithm is known, GRA can dramatically outperform KRA in terms of leaking and misleading rates though more running time is required.
Journal Article
Discovering Family Groups in Passenger Social Networks
2015
People usually travel together with others in groups for different purposes, such as family members for visiting relatives, colleagues for business, friends for sightseeing and so on. Especially, the family groups, as a kind of the most com- mon consumer units, have a considerable scale in the field of passenger transportation market. Accurately identifying family groups can help the carriers to provide passengers with personalized travel services and precise product recommendation. This paper studies the problem of finding family groups in the field of civil aviation and proposes a family group detection method based on passenger social networks. First of all, we construct passenger social networks based on their co-travel behaviors extracted from the historical travel records; secondly, we use a collective classification algorithm to classify the social relationships between passengers into family or non-family relationship groups; finally, we employ a weighted com- munity detection algorithm to find family groups, which takes the relationship classification results as the weights of edges. Experimental results on a real dataset of passenger travel records in the field of civil aviation demonstrate that our method can effectively find family groups from historical travel records.
Journal Article
Classifying and clustering in negative databases
2013
Recently, negative databases (NDBs) are proposed for privacy protection. Similar to the traditional databases, some basic operations could be conducted over the NDBs, such as select, intersection, update, delete and so on. However, both classifying and clustering in negative databases have not yet been studied. Therefore, two algorithms, i.e., a k nearest neighbor (kNN) classification algorithm and a k-means clustering algorithm in NDBs, are proposed in this paper, respectively. The core of these two algorithms is a novel method for estimating the Hamming distance between a binary string and an NDB. Experimental results demonstrate that classifying and clustering in NDBs are promising.
Journal Article
Multi-Domain Sentiment Classification with Classifier Combination
2011
State-of-the-arts studies on sentiment classification are typically domain-dependent and domain-restricted. In this paper, we aim to reduce domain dependency and improve overall performance simultaneously by proposing an efficient multi-domain sentiment classification algorithm. Our method employs the approach of multiple classifier combination. In this approach, we first train single domain classifiers separately with domain specific data, and then combine the classifiers for the final decision. Our experiments show that this approach performs much better than both single domain classification approach (using the training data individually) and mixed domain classification approach (simply combining all the training data). In particular, classifier combination with weighted sum rule obtains an average error reduction of 27.6% over single domain classification.
Journal Article