Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
185
result(s) for
"Shang-Wen, Li"
Sort by:
TERA: Self-Supervised Learning of Transformer Encoder Representation for Speech
2021
We introduce a self-supervised speech pre-training method called TERA, which stands for Transformer Encoder Representations from Alteration. Recent approaches often learn by using a single auxiliary task like contrastive prediction, autoregressive prediction, or masked reconstruction. Unlike previous methods, we use alteration along three orthogonal axes to pre-train Transformer Encoders on a large amount of unlabeled speech. The model learns through the reconstruction of acoustic frames from their altered counterpart, where we use a stochastic policy to alter along various dimensions: time, frequency, and magnitude. TERA can be used for speech representations extraction or fine-tuning with downstream models. We evaluate TERA on several downstream tasks, including phoneme classification, keyword spotting, speaker recognition, and speech recognition. We present a large-scale comparison of various self-supervised models. TERA achieves strong performance in the comparison by improving upon surface features and outperforming previous models. In our experiments, we study the effect of applying different alteration techniques, pre-training on more data, and pre-training on various features. We analyze different model sizes and find that smaller models are strong representation learners than larger models, while larger models are more effective for downstream fine-tuning than smaller models. Furthermore, we show the proposed method is transferable to downstream datasets not used in pre-training.
SpeechPrompt: Prompting Speech Language Models for Speech Processing Tasks
by
Shen, Hua
,
Kang, Iu-thing
,
Wei-Cheng, Tseng
in
Harnesses
,
Parameter modification
,
Prompt engineering
2024
Prompting has become a practical method for utilizing pre-trained language models (LMs). This approach offers several advantages. It allows an LM to adapt to new tasks with minimal training and parameter updates, thus achieving efficiency in both storage and computation. Additionally, prompting modifies only the LM's inputs and harnesses the generative capabilities of language models to address various downstream tasks in a unified manner. This significantly reduces the need for human labor in designing task-specific models. These advantages become even more evident as the number of tasks served by the LM scales up. Motivated by the strengths of prompting, we are the first to explore the potential of prompting speech LMs in the domain of speech processing. Recently, there has been a growing interest in converting speech into discrete units for language modeling. Our pioneer research demonstrates that these quantized speech units are highly versatile within our unified prompting framework. Not only can they serve as class labels, but they also contain rich phonetic information that can be re-synthesized back into speech signals for speech generation tasks. Specifically, we reformulate speech processing tasks into speech-to-unit generation tasks. As a result, we can seamlessly integrate tasks such as speech classification, sequence generation, and speech generation within a single, unified prompting framework. The experiment results show that the prompting method can achieve competitive performance compared to the strong fine-tuning method based on self-supervised learning models with a similar number of trainable parameters. The prompting method also shows promising results in the few-shot setting. Moreover, with the advanced speech LMs coming into the stage, the proposed prompting framework attains great potential.
GSQA: An End-to-End Model for Generative Spoken Question Answering
2024
In recent advancements in spoken question answering (QA), end-to-end models have made significant strides. However, previous research has primarily focused on extractive span selection. While this extractive-based approach is effective when answers are present directly within the input, it falls short in addressing abstractive questions, where answers are not directly extracted but inferred from the given information. To bridge this gap, we introduce the first end-to-end Generative Spoken Question Answering (GSQA) model that empowers the system to engage in abstractive reasoning. The challenge in training our GSQA model lies in the absence of a spoken abstractive QA dataset. We propose using text models for initialization and leveraging the extractive QA dataset to transfer knowledge from the text generative model to the spoken generative model. Experimental results indicate that our model surpasses the previous extractive model by 3% on extractive QA datasets. Furthermore, the GSQA model has only been fine-tuned on the spoken extractive QA dataset. Despite not having seen any spoken abstractive QA data, it can still closely match the performance of the cascade model. In conclusion, our GSQA model shows the potential to generalize to a broad spectrum of questions, thus further expanding the spoken question answering capabilities of abstractive QA. Our code is available at https://voidful.github.io/GSQA
Self-Supervised Speech Representation Learning: A Review
by
Edin, Joakim
,
Kirchhoff, Katrin
,
Hung-yi, Lee
in
Automatic speech recognition
,
Computer vision
,
Data transmission
2022
Although supervised deep learning has revolutionized speech and audio processing, it has necessitated the building of specialist models for individual tasks and application scenarios. It is likewise difficult to apply this to dialects and languages for which only limited labeled data is available. Self-supervised representation learning methods promise a single universal model that would benefit a wide variety of tasks and domains. Such methods have shown success in natural language processing and computer vision domains, achieving new levels of performance while reducing the number of labels required for many downstream scenarios. Speech representation learning is experiencing similar progress in three main categories: generative, contrastive, and predictive methods. Other approaches rely on multi-modal data for pre-training, mixing text or visual data streams with speech. Although self-supervised speech representation is still a nascent research area, it is closely related to acoustic word embedding and learning with zero lexical resources, both of which have seen active research for many years. This review presents approaches for self-supervised speech representation learning and their connection to other research areas. Since many current methods focus solely on automatic speech recognition as a downstream task, we review recent efforts on benchmarking learned representations to extend the application beyond speech recognition.
Population pharmacokinetics of clozapine and its primary metabolite norclozapine in Chinese patients with schizophrenia
by
Li-jun LI De-wei SHANG Wen-biao LI Wei GUO Xi-pei WANG Yu-peng REN An-ning LI Pei-xin FU Shuang-min JI Wei LU Chuan-yue WANG
in
Adolescent
,
Adult
,
Antipsychotic Agents - pharmacokinetics
2012
Aim: To develop a combined population pharmacokinetic model (PPK) to assess the magnitude and variability of exposure to both clozapine and its primary metabolite norclozapine in Chinese patients with refractory schizophrenia via sparse sampling with a focus on the effects of covariates on the pharmacokinetic parameters. Methods: Relevant patient concentration data (eg, demographic data, medication history, dosage regimen, time of last dose, sampling time, concentrations of clozapine and norclozapine, etc) were collected using a standardized data collection form. The demographic characteristics of the patients, including sex, age, weight, body surface area, smoking status, and information on concomitant medi- cations as well as biochemical and hematological test results were recorded. Persons who had smoked 5 or more cigarettes per day within the last week were defined as smokers. The concentrations of clozapine and norclozapine were measured using a HPLC system equipped with a UV detector. PPK analysis was performed using NONMEM. Age, weight, sex, and smoking status were evaluated as main covariates. The model was internally validated using normalized prediction distribution errors. Results: A total of 809 clozapine concentration data sets and 808 norclozapine concentration data sets from 162 inpatients (74 males, 88 females) at multiple mental health sites in China were included. The one-compartment pharmacokinetic model with mixture error could best describe the concentration-time profiles of clozapine and norclozapine. The population-predicted clearance of clozap- ine and norclozapine in female nonsmokers were 21.9 and 32.7 L/h, respectively. The population*predicted volumes of distribution for clozapine and norclozapine were 526 and 624 L, respectively. Smoking was significantly associated with increases in the clearance (clozapine by 45%; norclozapine by 54.3%). The clearance was significantly greater in males than in females (clozapine by 20.8%; nor- clozapine by 24.2%). The clearance of clozapine and norclozapine did not differ significantly between Chinese patients and American patients. Conclusion: Smoking and male were significantly associated with a lower exposure to clozapine and norclozapine due to higher clearance. This model can be used in individualized drug dosing and therapeutic drug monitoring.
Journal Article
Modbus/TCP Communication Anomaly Detection Based on PSO-SVM
by
Wan, Ming
,
Shang, Wen Li
,
Zhang, Sheng Shan
in
Anomalies
,
Computer information security
,
Control systems
2014
Industrial firewall and intrusion detection system based on Modbus TCP protocol analysis and whitelist policy cannot effectively identify attacks on Modbus controller which exactly take advantage of the configured rules. An Industrial control systems simulation environment is established and a data preprocessing method for Modbus TCP traffic captured is designed to meet the need of anomaly detection module. Furthermore a Modbus function code sequence anomaly detection model based on SVM optimized by PSO method is designed. And the model can effectively identify abnormal Modbus TCP traffic, according to frequency of different short mode sequences in a Modbus code sequence.
Journal Article
The Forced Response Analysis of Heavy Duty Vehicle’s Cab
by
Liu, Jun Liang
,
Long, Yu Hong
,
Cai, Jie
in
Automotive engines
,
Automotive wheels
,
Crashworthiness
2014
An automobile will sustain various incentives from the outside and inside in the process of being driven, in which the impact of the wheel and the vibration of engine mainly dominates. This paper gets the inherent frequency through modal analysis on finite element model in a heavy vehicle’s driving cab. And then it conducts the forced response analysis on finite element model by modeling a condition where the vibration of driving cab is caused by outside incentive. Through analyzing, it finds the main response region generated in driving cab when affected by outside incentive. At the same time, it can provide certain theoretical basis for controlling the noise of driving cab in the future.
Journal Article
Simulation and Optimization for the Cooling System of Heavy Vehicle Engine
by
Zhan, Xin
,
Long, Yu Hong
,
Liu, Jun Liang
in
Computer programs
,
Computer simulation
,
Cooling systems
2014
An engine cooling system of heavy vehicle was used as a research object in this thesis. Engine system model was designed with KULI software according to wind tunnel testing data of heat exchanger provided by supplier. Simulation calculation provides optimization program based on the results. The result is that it can simulate calculation with KULI software, in the early design on a cooling system of heavy vehicle engine. The study provides a theoretical basis for the design of a cooling system of heavy vehicle engine, improve the efficiency and save the cost.
Journal Article
Simulation and Optimization Design of Gasoline Engine Exhaust Muffler
by
Du, Huai Hui
,
Long, Yu Hong
,
Liu, Jun Liang
in
Computer simulation
,
Coupling
,
Design optimization
2014
The coupling simulation model of a gasoline engine’s working process and muffler is established by GT-Power software, and related data are obtained by simulation calculation. It is found that the calculating results are consistent with the actual data, which validates the correctness of this simulation model. Through the analysis processing of the coupling simulation model, it is put forward an improved case for the exhaust muffler in this paper. Based on a local optimization, the simulation results show that the insertion loss of muffler has improved markedly, the exhaust noise values decreased and indicated power engine remain basically unchanged.
Journal Article
Research on Efficient Association Analysis Algorithm towards Production Process Data
2014
Association analysis of the production process data (PPD) can be discoveried on the quality relevant parameters with great impact, however, it’s different from correlation analysis of other fields, Huge amount of data due to the production process, and the many parameters involved in the production process, the existing association analysis algorithms as they deal with inefficiency, can not meet the practical application. This paper proposes a new process for the industrial production of efficient data association algorithm-AprioriMask, after the actual production process of association analysis verification, AprioriMask algorithm has significant performance improvements to meet the industrial production process data for correlation analysis.
Journal Article