Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
29
result(s) for
"Xue, Fuzhao"
Sort by:
Underwater Acoustic Target Recognition: A Combination of Multi-Dimensional Fusion Features and Modified Deep Neural Network
2019
A method with a combination of multi-dimensional fusion features and a modified deep neural network (MFF-MDNN) is proposed to recognize underwater acoustic targets in this paper. Specifically, due to the complex and changeable underwater environment, it is difficult to describe underwater acoustic signals with a single feature. The Gammatone frequency cepstral coefficient (GFCC) and modified empirical mode decomposition (MEMD) are developed to extract multi-dimensional features in this paper. Moreover, to ensure the same time dimension, a dimension reduction method is proposed to obtain multi-dimensional fusion features in the original underwater acoustic signals. Then, to reduce redundant features and further improve recognition accuracy, the Gaussian mixture model (GMM) is used to modify the structure of a deep neural network (DNN). Finally, the proposed underwater acoustic target recognition method can obtain an accuracy of 94.3% under a maximum of 800 iterations when the dataset has underwater background noise with weak targets. Compared with other methods, the recognition results demonstrate that the proposed method has higher accuracy and strong adaptability.
Journal Article
Recent advances in deep learning based dialogue systems: a systematic survey
2023
Dialogue systems are a popular natural language processing (NLP) task as it is promising in real-life applications. It is also a complicated task since many NLP tasks deserving study are involved. As a result, a multitude of novel works on this task are carried out, and most of them are deep learning based due to their outstanding performance. In this survey, we mainly focus on the deep learning based dialogue systems. We comprehensively review state-of-the-art research outcomes in dialogue systems and analyze them from two angles: model type and system type. Specifically, from the angle of model type, we discuss the principles, characteristics, and applications of different models that are widely used in dialogue systems. This will help researchers acquaint these models and see how they are applied in state-of-the-art frameworks, which is rather helpful when designing a new dialogue system. From the angle of system type, we discuss task-oriented and open-domain dialogue systems as two streams of research, providing insight into the hot topics related. Furthermore, we comprehensively review the evaluation methods and datasets for dialogue systems to pave the way for future research. Finally, some possible research trends are identified based on the recent research outcomes. To the best of our knowledge, this survey is the most comprehensive and up-to-date one at present for deep learning based dialogue systems, extensively covering the popular techniques. We speculate that this work is a good starting point for academics who are new to the dialogue systems or those who want to quickly grasp up-to-date techniques in this area.
Journal Article
Towards Efficient Transformer Scaling
2024
In recent years, Transformer-based deep learning models have exhibited remarkable performance across a myriad of tasks. A pivotal advantage of the Transformer architecture lies in its scalability, spanning dimensions such as dataset size, parameter count, and computational budget. This scaling capability empowers Transformers to attain substantial improvements and even unlock novel capabilities, enabling the accomplishment of tasks previously deemed impossible. However, the pursuit of scaling comes at a considerable cost, limiting the progress of deep learning due to resource constraints. This thesis addresses this challenge by exploring a series of strategies to enhance the efficiency of Transformer scaling. Firstly, the introduction of more trainable parameters can significantly enhance performance but demands increased memory usage. To address this trade-off, we present WideNet, a model that optimizes parameter efficiency by leveraging parameter-sharing and Mixture-of-Experts, achieving superior results in both computer vision and natural language tasks.Secondly, when training different transformer models with distinct objectives at the same scale, we often adopt uniform configurations, such as width and depth. Our investigation into the relationship between transformer configuration and training objectives reveals that token-level training aligns better with deeper and narrower configurations, while sequence-level training encounters challenges in scaling depth due to over-smoothing.Motivated by real-world applications requiring processing of lengthy input sequences (e.g.,, document understanding and medicinal image processing), we focus on scaling the transformer along sequence length from a training system perspective. Our sequence parallelism approach achieves a 27 × increase in maximum sequence length compared to previous methodologies.Transformers face limitations in handling fixed computation budgets at each scale, necessitating the deployment of multiple models at different scales to cater to diverse service levels. To address this, we introduce AdaTape, which enables adaptive computation with elastic input sequences, offering an improved cost-effectiveness trade-off and greater flexibility in utilizing foundation models. Lastly, recent insights from the transformer scaling community highlight the underestimated significance of dataset size. Rather than scaling trainable parameters faster than the dataset, achieving compute-optimal results requires a proportional scaling of model parameters and training tokens. Our exploration into dataset scaling reveals potential limitations in further scaling up large language models, prompting ongoing research into this emerging challenge.
Dissertation
GDPNet: Refining Latent Multi-View Graph for Relation Extraction
by
Xue, Fuzhao
,
Chng, Eng Siong
,
Zhang, Hao
in
Graph theory
,
Graphical representations
,
Modelling
2020
Relation Extraction (RE) is to predict the relation type of two entities that are mentioned in a piece of text, e.g., a sentence or a dialogue. When the given text is long, it is challenging to identify indicative words for the relation prediction. Recent advances on RE task are from BERT-based sequence modeling and graph-based modeling of relationships among the tokens in the sequence. In this paper, we propose to construct a latent multi-view graph to capture various possible relationships among tokens. We then refine this graph to select important words for relation prediction. Finally, the representation of the refined graph and the BERT-based sequence representation are concatenated for relation extraction. Specifically, in our proposed GDPNet (Gaussian Dynamic Time Warping Pooling Net), we utilize Gaussian Graph Generator (GGG) to generate edges of the multi-view graph. The graph is then refined by Dynamic Time Warping Pooling (DTWPool). On DialogRE and TACRED, we show that GDPNet achieves the best performance on dialogue-level RE, and comparable performance with the state-of-the-arts on sentence-level RE.
An Embarrassingly Simple Model for Dialogue Relation Extraction
2022
Dialogue relation extraction (RE) is to predict the relation type of two entities mentioned in a dialogue. In this paper, we propose a simple yet effective model named SimpleRE for the RE task. SimpleRE captures the interrelations among multiple relations in a dialogue through a novel input format named BERT Relation Token Sequence (BRS). In BRS, multiple [CLS] tokens are used to capture possible relations between different pairs of entities mentioned in the dialogue. A Relation Refinement Gate (RRG) is then designed to extract relation-specific semantic representation in an adaptive manner. Experiments on the DialogRE dataset show that SimpleRE achieves the best performance, with much shorter training time. Further, SimpleRE outperforms all direct baselines on sentence-level RE without using external resources.
To Repeat or Not To Repeat: Insights from Scaling LLM under Token-Crisis
2023
Recent research has highlighted the importance of dataset size in scaling language models. However, large language models (LLMs) are notoriously token-hungry during pre-training, and high-quality text data on the web is approaching its scaling limit for LLMs. To further enhance LLMs, a straightforward approach is to repeat the pre-training data for additional epochs. In this study, we empirically investigate three key aspects under this approach. First, we explore the consequences of repeating pre-training data, revealing that the model is susceptible to overfitting, leading to multi-epoch degradation. Second, we examine the key factors contributing to multi-epoch degradation, finding that significant factors include dataset size, model parameters, and training objectives, while less influential factors consist of dataset quality and model FLOPs. Finally, we explore whether widely used regularization can alleviate multi-epoch degradation. Most regularization techniques do not yield significant improvements, except for dropout, which demonstrates remarkable effectiveness but requires careful tuning when scaling up the model size. Additionally, we discover that leveraging mixture-of-experts (MoE) enables cost-effective and efficient hyper-parameter tuning for computationally intensive dense LLMs with comparable trainable parameters, potentially impacting efficient LLM development on a broader scale.
Automated Audio Captioning using Transfer Learning and Reconstruction Latent Space Similarity Regularization
2021
In this paper, we examine the use of Transfer Learning using Pretrained Audio Neural Networks (PANNs), and propose an architecture that is able to better leverage the acoustic features provided by PANNs for the Automated Audio Captioning Task. We also introduce a novel self-supervised objective, Reconstruction Latent Space Similarity Regularization (RLSSR). The RLSSR module supplements the training of the model by minimizing the similarity between the encoder and decoder embedding. The combination of both methods allows us to surpass state of the art results by a significant margin on the Clotho dataset across several metrics and benchmarks.
Hierarchical Dialogue Understanding with Special Tokens and Turn-level Attention
2023
Compared with standard text, understanding dialogue is more challenging for machines as the dynamic and unexpected semantic changes in each turn. To model such inconsistent semantics, we propose a simple but effective Hierarchical Dialogue Understanding model, HiDialog. Specifically, we first insert multiple special tokens into a dialogue and propose the turn-level attention to learn turn embeddings hierarchically. Then, a heterogeneous graph module is leveraged to polish the learned embeddings. We evaluate our model on various dialogue understanding tasks including dialogue relation extraction, dialogue emotion recognition, and dialogue act classification. Results show that our simple approach achieves state-of-the-art performance on all three tasks above. All our source code is publicly available at https://github.com/ShawX825/HiDialog.
Boosting LLM via Learning from Data Iteratively and Selectively
2024
Datasets nowadays are generally constructed from multiple sources and using different synthetic techniques, making data de-noising and de-duplication crucial before being used for post-training. In this work, we propose to perform instruction tuning by iterative data selection (). We measure the quality of a sample from complexity and diversity simultaneously. Instead of calculating the complexity score once for all before fine-tuning, we highlight the importance of updating this model-specific score during fine-tuning to accurately accommodate the dynamic changes of the model. On the other hand, the diversity score is defined on top of the samples' responses under the consideration of their informativeness. IterIT integrates the strengths of both worlds by iteratively updating the complexity score for the top-ranked samples and greedily selecting the ones with the highest complexity-diversity score. Experiments on multiple instruction-tuning data demonstrate consistent improvements of IterIT over strong baselines. Moreover, our approach also generalizes well to domain-specific scenarios and different backbone models. All resources will be available at https://github.com/JiaQiSJTU/IterIT.
Boosting LLM via Learning from Data Iteratively and Selectively
2024
Datasets nowadays are generally constructed from multiple sources and using different synthetic techniques, making data de-noising and de-duplication crucial before being used for post-training. In this work, we propose to perform instruction tuning by iterative data selection (\\ApproachName{}). We measure the quality of a sample from complexity and diversity simultaneously. Instead of calculating the complexity score once for all before fine-tuning, we highlight the importance of updating this model-specific score during fine-tuning to accurately accommodate the dynamic changes of the model. On the other hand, the diversity score is defined on top of the samples' responses under the consideration of their informativeness. IterIT integrates the strengths of both worlds by iteratively updating the complexity score for the top-ranked samples and greedily selecting the ones with the highest complexity-diversity score. Experiments on multiple instruction-tuning data demonstrate consistent improvements of IterIT over strong baselines. Moreover, our approach also generalizes well to domain-specific scenarios and different backbone models. All resources will be available at https://github.com/JiaQiSJTU/IterIT.