Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
20
result(s) for
"Saijo, Kohei"
Sort by:
Multicenter prospective phase II trial of concurrent chemoradiotherapy with weekly low-dose carboplatin for cisplatin-ineligible patients with advanced head and neck squamous cell carcinoma
by
Okabe, Ryuichi
,
Yamazaki, Keisuke
,
Sato, Yuichiro
in
Adverse events
,
Alanine transaminase
,
Carboplatin
2024
BackgroundThe optimal chemotherapy regimen in concurrent chemoradiotherapy (CCRT) for cisplatin-ineligible head and neck squamous cell carcinoma (HNSCC) has not been established. We aimed to evaluate the feasibility, efficacy, and safety of CCRT with weekly low-dose carboplatin for the treatment of advanced HNSCC in patients who are cisplatin-ineligible.MethodsThis prospective phase II study enrolled adult patients (age ≥ 20 years) with HNSCC receiving whole-neck irradiation including bilateral levels II–IV and who were aged (≥ 75-year-old patients with 40 mL/min estimated glomerular filtration rate [eGFR] or better) or had renal dysfunction (< 75-year-old patients with 30–60 mL/min eGFR). Carboplatin was administered weekly (area under the plasma concentration–time curve = 2.0) for up to seven cycles during concurrent radiotherapy (70 Gy/35 Fr). The primary endpoint was the completion rate of CCRT. Secondary endpoints included overall response rate and incidence of adverse events.ResultsAmong the 30 patients enrolled, 28 were men. The median age was 73.5 years. Seventeen patients were < 75 years whereas 13 were ≥ 75 years old. The completion rate of CCRT was 90%. The overall response rate was 90%. Grade 3 adverse events that occurred in 10% or more patients were oral/pharyngeal mucositis (47%), leukocytopenia (20%), and neutropenia (10%). Grade 4 adverse events occurred in one patient (elevation of alanine aminotransferase level). No treatment-related deaths occurred.ConclusionCCRT with weekly low-dose carboplatin is a promising treatment option, with favorable feasibility, efficacy, and acceptable toxicity, for patients who are cisplatin-ineligible with advanced HNSCC.Clinical Trial Registration NumberjRCTs031190028.
Journal Article
Remixing-based Unsupervised Source Separation from Scratch
2023
We propose an unsupervised approach for training separation models from scratch using RemixIT and Self-Remixing, which are recently proposed self-supervised learning methods for refining pre-trained models. They first separate mixtures with a teacher model and create pseudo-mixtures by shuffling and remixing the separated signals. A student model is then trained to separate the pseudo-mixtures using either the teacher's outputs or the initial mixtures as supervision. To refine the teacher's outputs, the teacher's weights are updated with the student's weights. While these methods originally assumed that the teacher is pre-trained, we show that they are capable of training models from scratch. We also introduce a simple remixing method to stabilize training. Experimental results demonstrate that the proposed approach outperforms mixture invariant training, which is currently the only available approach for training a monaural separation model from scratch.
Self-Remixing: Unsupervised Speech Separation via Separation and Remixing
2023
We present Self-Remixing, a novel self-supervised speech separation method, which refines a pre-trained separation model in an unsupervised manner. The proposed method consists of a shuffler module and a solver module, and they grow together through separation and remixing processes. Specifically, the shuffler first separates observed mixtures and makes pseudo-mixtures by shuffling and remixing the separated signals. The solver then separates the pseudo-mixtures and remixes the separated signals back to the observed mixtures. The solver is trained using the observed mixtures as supervision, while the shuffler's weights are updated by taking the moving average with the solver's, generating the pseudo-mixtures with fewer distortions. Our experiments demonstrate that Self-Remixing gives better performance over existing remixing-based self-supervised methods with the same or less training costs under unsupervised setup. Self-Remixing also outperforms baselines in semi-supervised domain adaptation, showing effectiveness in multiple setups.
Spatial Loss for Unsupervised Multi-channel Source Separation
2022
We propose a spatial loss for unsupervised multi-channel source separation. The proposed loss exploits the duality of direction of arrival (DOA) and beamforming: the steering and beamforming vectors should be aligned for the target source, but orthogonal for interfering ones. The spatial loss encourages consistency between the mixing and demixing systems from a classic DOA estimator and a neural separator, respectively. With the proposed loss, we train the neural separators based on minimum variance distortionless response (MVDR) beamforming and independent vector analysis (IVA). We also investigate the effectiveness of combining our spatial loss and a signal loss, which uses the outputs of blind source separation as the reference. We evaluate our proposed method on synthetic and recorded (LibriCSS) mixtures. We find that the spatial loss is most effective to train IVA-based separators. For the neural MVDR beamformer, it performs best when combined with a signal loss. On synthetic mixtures, the proposed unsupervised loss leads to the same performance as a supervised loss in terms of word error rate. On LibriCSS, we obtain close to state-of-the-art performance without any labeled training data.
Independence-based Joint Dereverberation and Separation with Neural Source Model
by
Scheibler, Robin
,
Saijo, Kohei
in
Artificial neural networks
,
Back propagation
,
Back propagation networks
2022
We propose an independence-based joint dereverberation and separation method with a neural source model. We introduce a neural network in the framework of time-decorrelation iterative source steering, which is an extension of independent vector analysis to joint dereverberation and separation. The network is trained in an end-to-end manner with a permutation invariant loss on the time-domain separation output signals. Our proposed method can be applied in any situation with at least as many microphones as sources, regardless of their number. In experiments, we demonstrate that our method results in high performance in terms of both speech quality metrics and word error rate (WER), even for mixtures with a different number of speakers than training. Furthermore, the model, trained on synthetic mixtures, without any modifications, greatly reduces the WER on the recorded dataset LibriCSS.
Remix-cycle-consistent Learning on Adversarially Learned Separator for Accurate and Stable Unsupervised Speech Separation
2022
A new learning algorithm for speech separation networks is designed to explicitly reduce residual noise and artifacts in the separated signal in an unsupervised manner. Generative adversarial networks are known to be effective in constructing separation networks when the ground truth for the observed signal is inaccessible. Still, weak objectives aimed at distribution-to-distribution mapping make the learning unstable and limit their performance. This study introduces the remix-cycle-consistency loss as a more appropriate objective function and uses it to fine-tune adversarially learned source separation models. The remix-cycle-consistency loss is defined as the difference between the mixed speech observed at microphones and the pseudo-mixed speech obtained by alternating the process of separating the mixed sound and remixing its outputs with another combination. The minimization of this loss leads to an explicit reduction in the distortions in the output of the separation network. Experimental comparisons with multichannel speech separation demonstrated that the proposed method achieved high separation accuracy and learning stability comparable to supervised learning.
Beyond Performance Plateaus: A Comprehensive Study on Scalability in Speech Enhancement
2024
Deep learning-based speech enhancement (SE) models have achieved impressive performance in the past decade. Numerous advanced architectures have been designed to deliver state-of-the-art performance; however, their scalability potential remains unrevealed. Meanwhile, the majority of research focuses on small-sized datasets with restricted diversity, leading to a plateau in performance improvement. In this paper, we aim to provide new insights for addressing the above issues by exploring the scalability of SE models in terms of architectures, model sizes, compute budgets, and dataset sizes. Our investigation involves several popular SE architectures and speech data from different domains. Experiments reveal both similarities and distinctions between the scaling effects in SE and other tasks such as speech recognition. These findings further provide insights into the under-explored SE directions, e.g., larger-scale multi-domain corpora and efficiently scalable architectures.
URGENT Challenge: Universality, Robustness, and Generalizability For Speech Enhancement
2024
The last decade has witnessed significant advancements in deep learning-based speech enhancement (SE). However, most existing SE research has limitations on the coverage of SE sub-tasks, data diversity and amount, and evaluation metrics. To fill this gap and promote research toward universal SE, we establish a new SE challenge, named URGENT, to focus on the universality, robustness, and generalizability of SE. We aim to extend the SE definition to cover different sub-tasks to explore the limits of SE models, starting from denoising, dereverberation, bandwidth extension, and declipping. A novel framework is proposed to unify all these sub-tasks in a single model, allowing the use of all existing SE approaches. We collected public speech and noise data from different domains to construct diverse evaluation data. Finally, we discuss the insights gained from our preliminary baseline experiments based on both generative and discriminative SE methods with 12 curated metrics.
Task-Aware Unified Source Separation
by
Germain, François G
,
Jonathan Le Roux
,
Wichern, Gordon
in
Audio data
,
Handles
,
Musical instruments
2024
Several attempts have been made to handle multiple source separation tasks such as speech enhancement, speech separation, sound event separation, music source separation (MSS), or cinematic audio source separation (CASS) with a single model. These models are trained on large-scale data including speech, instruments, or sound events and can often successfully separate a wide range of sources. However, it is still challenging for such models to cover all separation tasks because some of them are contradictory (e.g., musical instruments are separated in MSS while they have to be grouped in CASS). To overcome this issue and support all the major separation tasks, we propose a task-aware unified source separation (TUSS) model. The model uses a variable number of learnable prompts to specify which source to separate, and changes its behavior depending on the given prompts, enabling it to handle all the major separation tasks including contradictory ones. Experimental results demonstrate that the proposed TUSS model successfully handles the five major separation tasks mentioned earlier. We also provide some audio examples, including both synthetic mixtures and real recordings, to demonstrate how flexibly the TUSS model changes its behavior at inference depending on the prompts.
Leveraging Audio-Only Data for Text-Queried Target Sound Extraction
2024
The goal of text-queried target sound extraction (TSE) is to extract from a mixture a sound source specified with a natural-language caption. While it is preferable to have access to large-scale text-audio pairs to address a variety of text prompts, the limited number of available high-quality text-audio pairs hinders the data scaling. To this end, this work explores how to leverage audio-only data without any captions for the text-queried TSE task to potentially scale up the data amount. A straightforward way to do so is to use a joint audio-text embedding model, such as the contrastive language-audio pre-training (CLAP) model, as a query encoder and train a TSE model using audio embeddings obtained from the ground-truth audio. The TSE model can then accept text queries at inference time by switching to the text encoder. While this approach should work if the audio and text embedding spaces in CLAP were well aligned, in practice, the embeddings have domain-specific information that causes the TSE model to overfit to audio queries. We investigate several methods to avoid overfitting and show that simple embedding-manipulation methods such as dropout can effectively alleviate this issue. Extensive experiments demonstrate that using audio-only data with embedding dropout is as effective as using text captions during training, and audio-only data can be effectively leveraged to improve text-queried TSE models.