Asset Details
MbrlCatalogueTitleDetail
Do you wish to reserve the book?
Deliberation Model for On-Device Spoken Language Understanding
by
Paden Tomasello
, Livshits, Aleksandr
, Seltzer, Michael L
, Kalinli, Ozlem
, Le, Duc
, Shrivastava, Akshat
, Kim, Suyoun
in
Automatic speech recognition
/ Hypotheses
/ Semantics
2022
Hey, we have placed the reservation for you!
By the way, why not check out events that you can attend while you pick your title.
You are currently in the queue to collect this book. You will be notified once it is your turn to collect the book.
Oops! Something went wrong.
Looks like we were not able to place the reservation. Kindly try again later.
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
Do you wish to request the book?
Deliberation Model for On-Device Spoken Language Understanding
by
Paden Tomasello
, Livshits, Aleksandr
, Seltzer, Michael L
, Kalinli, Ozlem
, Le, Duc
, Shrivastava, Akshat
, Kim, Suyoun
in
Automatic speech recognition
/ Hypotheses
/ Semantics
2022
Please be aware that the book you have requested cannot be checked out. If you would like to checkout this book, you can reserve another copy
We have requested the book for you!
Your request is successful and it will be processed during the Library working hours. Please check the status of your request in My Requests.
Oops! Something went wrong.
Looks like we were not able to place your request. Kindly try again later.
Deliberation Model for On-Device Spoken Language Understanding
Paper
Deliberation Model for On-Device Spoken Language Understanding
2022
Request Book From Autostore
and Choose the Collection Method
Overview
We propose a novel deliberation-based approach to end-to-end (E2E) spoken language understanding (SLU), where a streaming automatic speech recognition (ASR) model produces the first-pass hypothesis and a second-pass natural language understanding (NLU) component generates the semantic parse by conditioning on both ASR's text and audio embeddings. By formulating E2E SLU as a generalized decoder, our system is able to support complex compositional semantic structures. Furthermore, the sharing of parameters between ASR and NLU makes the system especially suitable for resource-constrained (on-device) environments; our proposed approach consistently outperforms strong pipeline NLU baselines by 0.60% to 0.65% on the spoken version of the TOPv2 dataset (STOP). We demonstrate that the fusion of text and audio features, coupled with the system's ability to rewrite the first-pass hypothesis, makes our approach more robust to ASR errors. Finally, we show that our approach can significantly reduce the degradation when moving from natural speech to synthetic speech training, but more work is required to make text-to-speech (TTS) a viable solution for scaling up E2E SLU.
Publisher
Cornell University Library, arXiv.org
Subject
This website uses cookies to ensure you get the best experience on our website.