Catalogue Search | MBRL

Deep Spiking Neural Networks for Large Vocabulary Automatic Speech Recognition

by Tan, Kay Chen , Wu, Jibin , Yılmaz, Emre in acoustic modeling , Acoustics , automatic speech recognition

2020

Artificial neural networks (ANN) have become the mainstream acoustic modeling technique for large vocabulary automatic speech recognition (ASR). A conventional ANN features a multi-layer architecture that requires massive amounts of computation. The brain-inspired spiking neural networks (SNN) closely mimic the biological neural networks and can operate on low-power neuromorphic hardware with spike-based computation. Motivated by their unprecedented energy-efficiency and rapid information processing capability, we explore the use of SNNs for speech recognition. In this work, we use SNNs for acoustic modeling and evaluate their performance on several large vocabulary recognition scenarios. The experimental results demonstrate competitive ASR accuracies to their ANN counterparts, while require only 10 algorithmic time steps and as low as 0.68 times total synaptic operations to classify each audio frame. Integrating the algorithmic power of deep SNNs with energy-efficient neuromorphic hardware, therefore, offer an attractive solution for ASR applications running locally on mobile and embedded devices.

Journal Article

Share this book

Add to My Shelf

Influence of co-blending fly ash and ceramic waste powder on the performance and microstructure of cementitious substrates under sulfate dry-wet cycle attack

by Zhang, Lu , Li, Haizhou , Wang, Xinyan in Binders (materials) , By-products , Carbon

2025

This study examines the properties of cement-based materials incorporating composite additions of fly ash and ceramic waste powder (CWP) as supplementary cementitious materials (SCM). The resistance of the materials to sulfate erosion under dry-wet cycling conditions was investigated through experimental testing. A Box-Behnken Design was employed to establish a model using three factors: the replacement ratio of cement by SCMs, the mass ratio of CWP to SCMs, and the water-to-binder ratio. The response variable was the mass loss rate due to sulfate erosion after 24 cycles of dry-wet cycling. Significance analysis of single-factor and multiple-factor interactions was conducted based on the response surface model. The research findings indicate that the cement-based materials with combined additions of fly ash and CWP exhibit optimal resistance to sulfate erosion under dry-wet cycling conditions. The water-to-binder ratio was identified as the most significant factor affecting the corrosion resistance of the cement-based materials at 7 days of curing. The dosage of ceramic waste powder influenced the corrosion performance of the cement-based materials at 28 days of curing. The content of SCMs affected the corrosion resistance of the cement-based materials after 56 days of curing. Comparative analysis of the grayscale three-dimensional distribution map and histogram of the cement-based materials with SCMs revealed an increase in the compactness of the matrix.

Journal Article

Share this book

Add to My Shelf

Precise-Spike-Driven Synaptic Plasticity: Learning Hetero-Association of Spatiotemporal Spike Patterns

by Tan, Kay Chen , Yu, Qiang , Tang, Huajin in Action Potentials - physiology , Algorithms , Analysis

2013

A new learning rule (Precise-Spike-Driven (PSD) Synaptic Plasticity) is proposed for processing and memorizing spatiotemporal patterns. PSD is a supervised learning rule that is analytically derived from the traditional Widrow-Hoff rule and can be used to train neurons to associate an input spatiotemporal spike pattern with a desired spike train. Synaptic adaptation is driven by the error between the desired and the actual output spikes, with positive errors causing long-term potentiation and negative errors causing long-term depression. The amount of modification is proportional to an eligibility trace that is triggered by afferent spikes. The PSD rule is both computationally efficient and biologically plausible. The properties of this learning rule are investigated extensively through experimental simulations, including its learning performance, its generality to different neuron models, its robustness against noisy conditions, its memory capacity, and the effects of its learning parameters. Experimental results show that the PSD rule is capable of spatiotemporal pattern classification, and can even outperform a well studied benchmark algorithm with the proposed relative confidence criterion. The PSD rule is further validated on a practical example of an optical character recognition problem. The results again show that it can achieve a good recognition performance with a proper encoding. Finally, a detailed discussion is provided about the PSD rule and several related algorithms including tempotron, SPAN, Chronotron and ReSuMe.

Journal Article

Share this book

Add to My Shelf

A Spiking Neural Network Framework for Robust Sound Classification

by Tan, Kay Chen , Wu, Jibin , Chua, Yansong in Acoustics , Artificial intelligence , automatic sound classification

2018

Environmental sounds form part of our daily life. With the advancement of deep learning models and the abundance of training data, the performance of automatic sound classification (ASC) systems has improved significantly in recent years. However, the high computational cost, hence high power consumption, remains a major hurdle for large-scale implementation of ASC systems on mobile and wearable devices. Motivated by the observations that humans are highly effective and consume little power whilst analyzing complex audio scenes, we propose a biologically plausible ASC framework, namely SOM-SNN. This framework uses the unsupervised self-organizing map (SOM) for representing frequency contents embedded within the acoustic signals, followed by an event-based spiking neural network (SNN) for spatiotemporal spiking pattern classification. We report experimental results on the RWCP environmental sound and TIDIGITS spoken digits datasets, which demonstrate competitive classification accuracies over other deep learning and SNN-based models. The SOM-SNN framework is also shown to be highly robust to corrupting noise after multi-condition training, whereby the model is trained with noise-corrupted sound samples. Moreover, we discover the early decision making capability of the proposed framework: an accurate classification can be made with an only partial presentation of the input.

Journal Article

Share this book

Add to My Shelf

Is Neuromorphic MNIST Neuromorphic? Analyzing the Discriminative Power of Neuromorphic Datasets in the Time Domain

by Iyer, Laxmi R. , Chua, Yansong , Li, Haizhou in Algorithms , Codes , Datasets

2021

A major characteristic of spiking neural networks (SNNs) over conventional artificial neural networks (ANNs) is their ability to spike, enabling them to use spike timing for coding and efficient computing. In this paper, we assess if neuromorphic datasets recorded from static images are able to evaluate the ability of SNNs to use spike timings in their calculations. We have analyzed N-MNIST, N-Caltech101 and DvsGesture along these lines, but focus our study on N-MNIST. First we evaluate if additional information is encoded in the time domain in a neuromorphic dataset. We show that an ANN trained with backpropagation on frame-based versions of N-MNIST and N-Caltech101 images achieve 99.23 and 78.01% accuracy. These are comparable to the state of the art—showing that an algorithm that purely works on spatial data can classify these datasets. Second we compare N-MNIST and DvsGesture on two STDP algorithms, RD-STDP, that can classify only spatial data, and STDP-tempotron that classifies spatiotemporal data. We demonstrate that RD-STDP performs very well on N-MNIST, while STDP-tempotron performs better on DvsGesture. Since DvsGesture has a temporal dimension, it requires STDP-tempotron, while N-MNIST can be adequately classified by an algorithm that works on spatial data alone. This shows that precise spike timings are not important in N-MNIST. N-MNIST does not, therefore, highlight the ability of SNNs to classify temporal data. The conclusions of this paper open the question—what dataset can evaluate SNN ability to classify temporal data?

Journal Article

Share this book

Add to My Shelf

On the study of replay and voice conversion attacks to text-dependent speaker verification

by Li, Haizhou , Wu, Zhizheng in Analysis , Biometrics , Computer Communication Networks

2016

Automatic speaker verification (ASV) is to automatically accept or reject a claimed identity based on a speech sample. Recently, individual studies have confirmed the vulnerability of state-of-the-art text-independent ASV systems under replay, speech synthesis and voice conversion attacks on various databases. However, the behaviours of text-dependent ASV systems have not been systematically assessed in the face of various spoofing attacks. In this work, we first conduct a systematic analysis of text-dependent ASV systems to replay and voice conversion attacks using the same protocol and database, in particular the RSR2015 database which represents mobile device quality speech. We then analyse the interplay of voice conversion and speaker verification by linking the voice conversion objective evaluation measures with the speaker verification error rates to take a look at the vulnerabilities from the perspective of voice conversion.

Journal Article

Share this book

Add to My Shelf

Study on Chemical Constituents of Panax notoginseng Leaves

by Sun, Xiaojuan , Deng, Hongbo , Shu, Tengyun in Carbon , China , Chromatography

2023

Panax notoginseng (Burk.) F. H. is a genuine medicinal material in Yunnan Province. As accessories, P. notoginseng leaves mainly contain protopanaxadiol saponins. The preliminary findings have indicated that P. notoginseng leaves contribute to its significant pharmacological effects and have been administrated to tranquilize and treat cancer and nerve injury. Saponins from P. notoginseng leaves were isolated and purified by different chromatographic methods, and the structures of 1–22 were elucidated mainly through comprehensive analyses of spectroscopic data. Moreover, the SH-SY5Y cells protection bioactivities of all isolated compounds were tested by establishing L-glutamate models for nerve cell injury. As a result, twenty-two saponins, including eight dammarane saponins, namely notoginsenosides SL1-SL8 (1–8), were identified as new compounds, together with fourteen known compounds, namely notoginsenoside NL-A3 (9), ginsenoside Rc (10), gypenoside IX (11), gypenoside XVII (12), notoginsenoside Fc (13), quinquenoside L3 (14), notoginsenoside NL-B1 (15), notoginsenoside NL-C2 (16), notoginsenoside NL-H2 (17), notoginsenoside NL-H1 (18), vina-ginsenoside R13 (19), ginsenoside II (20), majoroside F4 (21), and notoginsenoside LK4 (22). Among them, notoginsenoside SL1 (1), notoginsenoside SL3 (3), notoginsenoside NL-A3 (9), and ginsenoside Rc (10) showed slight protective effects against L-glutamate-induced nerve cell injury (30 µM).

Journal Article

Share this book

Add to My Shelf

Efficient and robust temporal processing with neural oscillations modulated spiking neural networks

by Tan, Kay Chen , Yang, Qu , Yan, Yinsong in 639/705/1042 , 639/705/117 , 639/705/258

2025

The brain exhibits rich dynamical properties that underpin its remarkable temporal processing capabilities. However, spiking neural networks (SNNs) inspired by the brain have not yet matched their biological counterparts in temporal processing and remain vulnerable to noise perturbations. This study addresses these limitations by introducing Rhythm-SNN, which draws inspiration from the brain’s neural oscillation mechanism. Specifically, we employ heterogeneous oscillatory signals to modulate spiking neurons, enforcing them to activate periodically at distinct frequencies. This approach not only significantly reduces neuronal firing rates but also enhances the capability and robustness of SNNs in temporal processing. Extensive experiments and theoretical analyses demonstrate that Rhythm-SNN achieves state-of-the-art performance across a broad range of tasks, with a markedly reduced energy cost, even under strong perturbations. Notably, in the Intel Neuromorphic Deep Noise Suppression Challenge, Rhythm-SNN outperforms deep learning solutions by achieving over two orders of magnitude in energy reduction while delivering award-winning denoising performance. Temporal processing and robustness to noise are challenges in current spiking neural networks. Drawing on principles of neural oscillations, the authors introduce Rhythm-SNN, which enhances temporal processing and robustness while significantly reducing energy consumption.

Journal Article

Share this book

Add to My Shelf

An Efficient and Perceptually Motivated Auditory Neural Encoding and Decoding Algorithm for Spiking Neural Networks

by Pan, Zihan , Ambikairajah, Eliathamby , Chua, Yansong in auditory masking effects , Auditory nerve , auditory perception

2020

The auditory front-end is an integral part of a spiking neural network (SNN) when performing auditory cognitive tasks. It encodes the temporal dynamic stimulus, such as speech and audio, into an efficient, effective and reconstructable spike pattern to facilitate the subsequent processing. However, most of the auditory front-ends in current studies have not made use of recent findings in psychoacoustics and physiology concerning human listening. In this paper, we propose a neural encoding and decoding scheme that is optimized for audio processing. The neural encoding scheme, that we call Biologically plausible Auditory Encoding (BAE), emulates the functions of the perceptual components of the human auditory system, that include the cochlear filter bank, the inner hair cells, auditory masking effects from psychoacoustic models, and the spike neural encoding by the auditory nerve. We evaluate the perceptual quality of the BAE scheme using PESQ; the performance of the BAE based on sound classification and speech recognition experiments. Finally, we also built and published two spike-version of speech datasets: the Spike-TIDIGITS and the Spike-TIMIT, for researchers to use and benchmarking of future SNN research.

Journal Article

Share this book

Add to My Shelf

Making Social Robots More Attractive: The Effects of Voice Pitch, Humor and Empathy

by Nijholt, Anton , Niculescu, Andreea , See, Swee Lan in Control , Empathy , Engineering

2013

In this paper we explore how simple auditory/verbal features of the spoken language, such as voice characteristics (pitch) and language cues (empathy/humor expression) influence the quality of interaction with a social robot receptionist. For our experiment two robot characters were created: Olivia, the more extrovert, exuberant, and humorous robot with a higher voice pitch and Cynthia, the more introvert, calmer and more serious robot with a lower voice pitch. Our results showed that the voice pitch seemed to have a strong influence on the way users rated the overall interaction quality, as well as the robot’s appeal and overall enjoyment. Further, the humor appeared to improve the users’ perception of task enjoyment, robot personality and speaking style while the empathy showed effects on the way users evaluated the robot’s receptive behavior and the interaction ease. With our study, we would like to stress in particular the importance of voice pitch in human robot interaction and to encourage further research on this topic.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter