Asset Details
MbrlCatalogueTitleDetail
Do you wish to reserve the book?
Grounded-Knowledge-Enhanced Instruction Understanding for Multimodal Assistant Applications
by
Wu, Te-Lin
in
Artificial intelligence
/ Computer Engineering
/ Computer science
2024
Hey, we have placed the reservation for you!
By the way, why not check out events that you can attend while you pick your title.
You are currently in the queue to collect this book. You will be notified once it is your turn to collect the book.
Oops! Something went wrong.
Looks like we were not able to place the reservation. Kindly try again later.
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
Do you wish to request the book?
Grounded-Knowledge-Enhanced Instruction Understanding for Multimodal Assistant Applications
by
Wu, Te-Lin
in
Artificial intelligence
/ Computer Engineering
/ Computer science
2024
Please be aware that the book you have requested cannot be checked out. If you would like to checkout this book, you can reserve another copy
We have requested the book for you!
Your request is successful and it will be processed during the Library working hours. Please check the status of your request in My Requests.
Oops! Something went wrong.
Looks like we were not able to place your request. Kindly try again later.
Grounded-Knowledge-Enhanced Instruction Understanding for Multimodal Assistant Applications
Dissertation
Grounded-Knowledge-Enhanced Instruction Understanding for Multimodal Assistant Applications
2024
Request Book From Autostore
and Choose the Collection Method
Overview
With the recent advancements in artificial intelligence (AI), researchers are making endeavours towards building an AI that can understand humans, collaborate with humans, and help or guide them to accomplish certain everyday chores. The actualization of such an assistant AI can pose several challenges including planning (on certain events), comprehending human instructions, multimodal understanding, and grounded conversational ability.Imagine a scenario that one wishes to perform a task, such as “making a plate of fried rice”, or “purchasing a suitable sofa bed”, which can require multiple steps of actions and manipulation of certain objects. How would an assistant AI collaborate with humans to accomplish such desired tasks? One crucial aspect of the system is to understand how and when to take a certain action, which is often learned from interpreting and following a guidance, a piece of resource that encompasses knowledge about accomplishing the task and potentially the events that will occur during task completions. The guidance can come from human verbal interactions (e.g., in the form of a conversation or a question) or static written instructional manuals.In the first part of this thesis, I will decompose the proposed system framework into three foundational components: (1) task-step sequencing/planning, where the AI needs to understand the appropriate sequential procedure of performing each sub-task to accomplish the whole task, especially when the task knowledge is learned from instructional resources online that can be many and do not always come consolidated with proper ordering; (2) action-dependencies understanding, where an agent should be able to infer dependencies of performing an action and the outcomes after executing a particular action, in order to examine the situations and adjust the plan of accomplishing tasks; (3) multimodal grounding and active perception, that we equip the AI with the ability to actively ground the visually perceived surroundings to the textual instructions (or verbal interactions) and perform reasoning over multimodal information along the task completions.In the second part of this thesis, I will introduce two newly curated resources that foresee the next-phase challenges towards building a strong and helpful assistive AI. One such resource focuses on counterfactual reasoning, a type of reasoning capability humans frequently rely on when performing complex decision making processes; while the other presents a comprehensive suite of multimodal capabilities of an assistive AI to function in a virtually created world.Combining the two parts, the foundational components as well as the established novel challenging benchmarks, this thesis aims at providing a comprehensive research road map for the research direction of next-generation (multimodal) AI assistants.
Publisher
ProQuest Dissertations & Theses
ISBN
9798382790398
This website uses cookies to ensure you get the best experience on our website.