Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
LanguageLanguage
-
SubjectSubject
-
Item TypeItem Type
-
DisciplineDiscipline
-
YearFrom:-To:
-
More FiltersMore FiltersIs Peer Reviewed
Done
Filters
Reset
18
result(s) for
"Fang, Shih-Hau"
Sort by:
Using SincNet for Learning Pathological Voice Disorders
by
Fang, Shih-Hau
,
Hung, Chao-Hsiang
,
Wang, Chi-Te
in
Acoustics
,
classification
,
convolutional neural network
2022
Deep learning techniques such as convolutional neural networks (CNN) have been successfully applied to identify pathological voices. However, the major disadvantage of using these advanced models is the lack of interpretability in explaining the predicted outcomes. This drawback further introduces a bottleneck for promoting the classification or detection of voice-disorder systems, especially in this pandemic period. In this paper, we proposed using a series of learnable sinc functions to replace the very first layer of a commonly used CNN to develop an explainable SincNet system for classifying or detecting pathological voices. The applied sinc filters, a front-end signal processor in SincNet, are critical for constructing the meaningful layer and are directly used to extract the acoustic features for following networks to generate high-level voice information. We conducted our tests on three different Far Eastern Memorial Hospital voice datasets. From our evaluations, the proposed approach achieves the highest 7%–accuracy and 9%–sensitivity improvements from conventional methods and thus demonstrates superior performance in predicting input pathological waveforms of the SincNet system. More importantly, we intended to give possible explanations between the system output and the first-layer extracted speech features based on our evaluated results.
Journal Article
Forecasting Air Quality in Taiwan by Using Machine Learning
2020
This study proposes a gradient-boosting-based machine learning approach for predicting the PM
2.5
concentration in Taiwan. The proposed mechanism is evaluated on a large-scale database built by the Environmental Protection Administration, and Central Weather Bureau, Taiwan, which includes data from 77 air monitoring stations and 580 weather stations performing hourly measurements over 1 year. By learning from past records of PM
2.5
and neighboring weather stations’ climatic information, the forecasting model works well for 24-h prediction at most air stations. This study also investigates the geographical and meteorological divergence for the forecasting results of seven regional monitoring areas. We also compare the prediction performance between Taiwan, Taipei, and London; analyze the impact of industrial pollution; and propose an enhanced version of the prediction model to improve the prediction accuracy. The results indicate that Taipei and London have similar prediction results because these two cities have similar topography (basin) and are financial centers without domestic pollution sources. The results also suggest that after considering industrial impacts by incorporating additional features from the Taichung and Thong-Siau power plants, the proposed method achieves significant improvement in the coefficient of determination (
R
2
) from 0.58 to 0.71. Moreover, for Taichung City the root-mean-square error decreases from 8.56 for the conventional approach to 7.06 for the proposed method.
Journal Article
Improved Speech Authenticity Detection in Chinese–English Bilingual Contexts
by
Chang, Sheng-Chain
,
Tsai, Cheng-Yuan
,
Fang, Shih-Hau
in
Accuracy
,
acoustic signal processing
,
Algorithms
2024
The rapid evolution of voice technology has heightened the need for robust detection systems to distinguish between authentic and tampered speech. Recent competitions have significantly advanced the development of countermeasures against spoofing attacks. However, while advancements in detection technologies have been notable, existing methods often focus on a single type of tampering and language. Our contribution lies in developing an improved model that integrates an enhanced ResNet architecture with an LSTM to improve the detection of tampered audio, particularly in challenging multilingual scenarios. In the experiments, we built a hybrid dataset from self-recording Chinese speech and public VCTK2 English samples, enhanced the ResNet model generalization capabilities, and evaluated our approach using the bilingual dataset. Experiment results demonstrate that the proposed approach achieves a superior performance with an equal error rate of 11.62%, even in the face of bilingual conditions, and, more importantly, outperforms the leading models from ASVSpoof 2021 and ADD 2022 competitions. We also employed advanced tampering techniques, including CycleGAN voice conversion and auto splicing, to simulate real-world tampering scenarios and verify the effectiveness of the proposed approach.
Journal Article
Detection of Audio Tampering Based on Electric Network Frequency Signal
2023
The detection of audio tampering plays a crucial role in ensuring the authenticity and integrity of multimedia files. This paper presents a novel approach to identifying tampered audio files by leveraging the unique Electric Network Frequency (ENF) signal, which is inherent to the power grid and serves as a reliable indicator of authenticity. The study begins by establishing a comprehensive Chinese ENF database containing diverse ENF signals extracted from audio files. The proposed methodology involves extracting the ENF signal, applying wavelet decomposition, and utilizing the autoregressive model to train effective classification models. Subsequently, the framework is employed to detect audio tampering and assess the influence of various environmental conditions and recording devices on the ENF signal. Experimental evaluations conducted on our Chinese ENF database demonstrate the efficacy of the proposed method, achieving impressive accuracy rates ranging from 91% to 93%. The results emphasize the significance of ENF-based approaches in enhancing audio file forensics and reaffirm the necessity of adopting reliable tamper detection techniques in multimedia authentication.
Journal Article
Off-Line Evaluation of Mobile-Centric Indoor Positioning Systems: The Experiences from the 2017 IPIN Competition
by
Jiménez, Antonio
,
Chien, Ying-Ren
,
Lu, Wen-Chen
in
Benchmarking
,
Cellular telephone systems
,
Competitions
2018
The development of indoor positioning solutions using smartphones is a growing activity with an enormous potential for everyday life and professional applications. The research activities on this topic concentrate on the development of new positioning solutions that are tested in specific environments under their own evaluation metrics. To explore the real positioning quality of smartphone-based solutions and their capabilities for seamlessly adapting to different scenarios, it is needed to find fair evaluation frameworks. The design of competitions using extensive pre-recorded datasets is a valid way to generate open data for comparing the different solutions created by research teams. In this paper, we discuss the details of the 2017 IPIN indoor localization competition, the different datasets created, the teams participating in the event, and the results they obtained. We compare these results with other competition-based approaches (Microsoft and Perf-loc) and on-line evaluation web sites. The lessons learned by organising these competitions and the benefits for the community are addressed along the paper. Our analysis paves the way for future developments on the standardization of evaluations and for creating a widely-adopted benchmark strategy for researchers and companies in the field.
Journal Article
Transportation Modes Classification Using Sensors on Smartphones
2016
This paper investigates the transportation and vehicular modes classification by using big data from smartphone sensors. The three types of sensors used in this paper include the accelerometer, magnetometer, and gyroscope. This study proposes improved features and uses three machine learning algorithms including decision trees, K-nearest neighbor, and support vector machine to classify the user’s transportation and vehicular modes. In the experiments, we discussed and compared the performance from different perspectives including the accuracy for both modes, the executive time, and the model size. Results show that the proposed features enhance the accuracy, in which the support vector machine provides the best performance in classification accuracy whereas it consumes the largest prediction time. This paper also investigates the vehicle classification mode and compares the results with that of the transportation modes.
Journal Article
Ambulatory Phonation Monitoring With Wireless Microphones Based on the Speech Energy Envelope: Algorithm Development and Validation
2020
Voice disorders mainly result from chronic overuse or abuse, particularly in occupational voice users such as teachers. Previous studies proposed a contact microphone attached to the anterior neck for ambulatory voice monitoring; however, the inconvenience associated with taping and wiring, along with the lack of real-time processing, has limited its clinical application.
This study aims to (1) propose an automatic speech detection system using wireless microphones for real-time ambulatory voice monitoring, (2) examine the detection accuracy under controlled environment and noisy conditions, and (3) report the results of the phonation ratio in practical scenarios.
We designed an adaptive threshold function to detect the presence of speech based on the energy envelope. We invited 10 teachers to participate in this study and tested the performance of the proposed automatic speech detection system regarding detection accuracy and phonation ratio. Moreover, we investigated whether the unsupervised noise reduction algorithm (ie, log minimum mean square error) can overcome the influence of environmental noise in the proposed system.
The proposed system exhibited an average accuracy of speech detection of 89.9%, ranging from 81.0% (67,357/83,157 frames) to 95.0% (199,201/209,685 frames). Subsequent analyses revealed a phonation ratio between 44.0% (33,019/75,044 frames) and 78.0% (68,785/88,186 frames) during teaching sessions of 40-60 minutes; the durations of most of the phonation segments were less than 10 seconds. The presence of background noise reduced the accuracy of the automatic speech detection system, and an adjuvant noise reduction function could effectively improve the accuracy, especially under stable noise conditions.
This study demonstrated an average detection accuracy of 89.9% in the proposed automatic speech detection system with wireless microphones. The preliminary results for the phonation ratio were comparable to those of previous studies. Although the wireless microphones are susceptible to background noise, an additional noise reduction function can alleviate this limitation. These results indicate that the proposed system can be applied for ambulatory voice monitoring in occupational voice users.
Journal Article
Subcarrier selection for efficient CSI-based indoor localization
2018
Indoor positioning systems have received increasing attention for supporting location-based services. In recent Wi-Fi networks, the rich information in the physical layer, known as channel state information (CSI), has been recognized an effective positioning characteristic rather than traditional received signal strength. However, the positioning performance depends on a very high-dimensional CSI due to all pairs of transceiver antenna, which may incur over-fitting problems. This paper proposes a subcarrier-selection approach based on information theoretic learning to compensate for over-fitting problems in CSI-based localization systems. After equalizing the histogram of CSIs, the proposed algorithm computes the information gain of each subcarrier and forms a new low-dimensional subset of CSIs to reduce the complexity and to decrease possible over-fitting caused by redundant CSIs. We demonstrate the effectiveness of the proposed algorithm through experiments. On-site experimental results demonstrate that the proposed approach outperforms traditional feature selection schemes.
Journal Article
Robust Wi-Fi fingerprinting-based positioning in the presence of lying identities
by
Yeh, Shih-Chun
,
Chuang, Chung-Chih
,
Fang, Shih-Hau
in
Algorithms
,
Bayesian analysis
,
Fingerprinting
2018
The lying identity of an access point (AP) is one of the most serious threat in Wi-Fi positioning because an adversary can easily acquire a valid address by monitoring the transmission and masquerade as another AP in the networks. This study proposes a robust Wi-Fi localization algorithm that can tolerate the liars instead of explicitly detecting them. The proposed algorithm considers all possible combinations of APs in an unionbased approach such that the adversaries cannot easily affect the positioning results by masquerading APs. Onsite experimental results demonstrate that this approach apparently achieves more robust location estimation than the Bayesian approach and the cluster-based method in the presence of lying identities.
Journal Article
Fairness analysis of throughput and delay in WLAN environments with channel diversities
2011
The article investigates fairness in terms of throughput and packet delays among users with diverse channel conditions due to the mobility and fading effects in IEEE 802.11 WLAN (wireless local area networks) environments. From our analytical results, it is shown that 802.11 CSMA/CA can present fairness among hosts with identical link qualities regardless of equal or different data rates applied. Our analytical results further demonstrate that the presence of diverse channel conditions can pose significant unfairness on both throughput and packet delays even with a link adaptation mechanism since the MCSs (modulation and coding schemes) available are limited. The simulation results validate the accuracy of our analytical model.
Journal Article