Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Discipline
      Discipline
      Clear All
      Discipline
  • Is Peer Reviewed
      Is Peer Reviewed
      Clear All
      Is Peer Reviewed
  • Item Type
      Item Type
      Clear All
      Item Type
  • Subject
      Subject
      Clear All
      Subject
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
73 result(s) for "Fu, Jinlan"
Sort by:
The Relationship Between Basic Psychological Needs Satisfaction and Career Adaptability Among University Students: The Roles of Grit and Career Decision-Making Self-Efficacy
Enhancing the career adaptability of university students is a practical necessity for addressing the challenge of student employment. This study explores the relationship between basic psychological needs satisfaction and career adaptability among university students based on Basic Psychological Need Theory and Social Cognitive Career Theory and constructs a corresponding chain mediation model. A survey was conducted among 635 university students from six provinces across China. The results indicate the following findings: (1) grit partially mediates the relationship between basic psychological needs satisfaction and career adaptability among university students; (2) career decision-making self-efficacy also partially mediates this relationship; and (3) grit and career decision-making self-efficacy serve as chain mediators in the relationship between basic psychological needs satisfaction and career adaptability. This study provides empirical support and significant guidance for enhancing the career adaptability development of university students.
Introduction of hospital-based health technology assessment in China: experiences from seven pilot hospitals
This study aimed to introduce a pilot program for hospital-based health technology assessment (HB-HTA) in China and present the participants' experiences based on seven case studies from seven tertiary hospitals. One-year pilot projects were initiated at the beginning of 2018. Seven pilot hospitals were closely followed from the beginning until the completion of their pilot HTA project. Regular interviews were conducted with the hospital managers leading the HB-HTA projects and key members of the special HTA teams. Observations were made based on field trips and written HTA reports. Three pilot projects evaluated the use of medical consumables, three evaluated the use of surgical or medical interventions, and one evaluated an innovative management model for ventilators. Real-world data were collected from all the pilot projects to assist with the assessments. Most HB-HTA pilot projects achieved remarkable results such as improvements in economic efficiency; however, there were also obvious deficiencies such as the lack of a necessary cost-effectiveness analysis. The results varied among the seven HB-HTA pilot projects. The HB-HTA pilot program was implemented to promote the use of HB-HTA in hospital decision making in China. At the same time, HB-HTA in China faces challenges. We have made some policy recommendations based on the findings of the pilot projects.
OmniDialog: An Omnipotent Pre-training Model for Task-Oriented Dialogue System
Pre-trained conversation models (PCMs) have demonstrated remarkable results in task-oriented dialogue (TOD) systems. Many PCMs focus predominantly on dialogue management tasks like dialogue state tracking, dialogue generation tasks like response generation, or both. However, the existing PCMs seldom consider dialogue comprehension tasks, such as dialogue question answering and summarization tasks. These tasks allow PCMs to glean dialogue context from various angles. This observation naturally raises the question: Can the performance of downstream dialogue tasks be enhanced if a PCM is pre-trained on dialogue management, generation, and comprehension tasks? To investigate this, we proposed an Omnipotent Dialogue pre-training model (OmniDialog). It unifies these three dialogue tasks into a monolithic framework by multi-task learning, fostering inter-task communication. The pre-training corpus of OmniDialog spans \\(\\mathbf{7}\\) dialogue-focused tasks, drawing from \\(\\mathbf{15}\\) datasets and encompassing over \\(\\mathbf{3.2}\\) million dialogue utterances. To our knowledge, OmniDialog is a pioneering PCM pre-trained across dialogue management, generation, and comprehension domains. We evaluated its performance across four tasks: dialogue summarization, end-to-end dialogue modeling, dialogue state tracking, and intent classification. The results underscore its efficacy in domain transfer learning, low-resource, and full-dataset scenarios. Furthermore, to glean a nuanced understanding of OmniDialog's strengths and potential pitfalls, we designed a fine-grained analysis framework for dialogue-centric tasks. Experimental results show that the OmniDialog is good at hard samples, such as long dialogues and lengthy responses.
Unveiling In-Context Learning: A Coordinate System to Understand Its Working Mechanism
Large language models (LLMs) exhibit remarkable in-context learning (ICL) capabilities. However, the underlying working mechanism of ICL remains poorly understood. Recent research presents two conflicting views on ICL: One emphasizes the impact of similar examples in the demonstrations, stressing the need for label correctness and more shots. The other attributes it to LLMs' inherent ability of task recognition, deeming label correctness and shot numbers of demonstrations as not crucial. In this work, we provide a Two-Dimensional Coordinate System that unifies both views into a systematic framework. The framework explains the behavior of ICL through two orthogonal variables: whether similar examples are presented in the demonstrations (perception) and whether LLMs can recognize the task (cognition). We propose the peak inverse rank metric to detect the task recognition ability of LLMs and study LLMs' reactions to different definitions of similarity. Based on these, we conduct extensive experiments to elucidate how ICL functions across each quadrant on multiple representative classification tasks. Finally, we extend our analyses to generation tasks, showing that our coordinate system can also be used to interpret ICL for generation tasks effectively.
Polyglot Prompt: Multilingual Multitask PrompTraining
This paper aims for a potential architectural improvement for multilingual learning and asks: Can different tasks from different languages be modeled in a monolithic framework, i.e. without any task/language-specific module? The benefit of achieving this could open new doors for future multilingual research, including allowing systems trained on low resources to be further assisted by other languages as well as other tasks. We approach this goal by developing a learning framework named Polyglot Prompting to exploit prompting methods for learning a unified semantic space for different languages and tasks with multilingual prompt engineering. We performed a comprehensive evaluation of 6 tasks, namely topic classification, sentiment classification, named entity recognition, question answering, natural language inference, and summarization, covering 24 datasets and 49 languages. The experimental results demonstrated the efficacy of multilingual multitask prompt-based learning and led to inspiring observations. We also present an interpretable multilingual evaluation methodology and show how the proposed framework, multilingual multitask prompt training, works. We release all datasets prompted in the best setting and code.
HERMES: KV Cache as Hierarchical Memory for Efficient Streaming Video Understanding
Recent advancements in Multimodal Large Language Models (MLLMs) have demonstrated significant improvement in offline video understanding. However, extending these capabilities to streaming video inputs, remains challenging, as existing models struggle to simultaneously maintain stable understanding performance, real-time responses, and low GPU memory overhead. To address this challenge, we propose HERMES, a novel training-free architecture for real-time and accurate understanding of video streams. Based on a mechanistic attention investigation, we conceptualize KV cache as a hierarchical memory framework that encapsulates video information across multiple granularities. During inference, HERMES reuses a compact KV cache, enabling efficient streaming understanding under resource constraints. Notably, HERMES requires no auxiliary computations upon the arrival of user queries, thereby guaranteeing real-time responses for continuous video stream interactions, which achieves 10\\(\\times\\) faster TTFT compared to prior SOTA. Even when reducing video tokens by up to 68% compared with uniform sampling, HERMES achieves superior or comparable accuracy across all benchmarks, with up to 11.4% gains on streaming datasets.
FutureOmni: Evaluating Future Forecasting from Omni-Modal Context for Multimodal LLMs
Although Multimodal Large Language Models (MLLMs) demonstrate strong omni-modal perception, their ability to forecast future events from audio-visual cues remains largely unexplored, as existing benchmarks focus mainly on retrospective understanding. To bridge this gap, we introduce FutureOmni, the first benchmark designed to evaluate omni-modal future forecasting from audio-visual environments. The evaluated models are required to perform cross-modal causal and temporal reasoning, as well as effectively leverage internal knowledge to predict future events. FutureOmni is constructed via a scalable LLM-assisted, human-in-the-loop pipeline and contains 919 videos and 1,034 multiple-choice QA pairs across 8 primary domains. Evaluations on 13 omni-modal and 7 video-only models show that current systems struggle with audio-visual future prediction, particularly in speech-heavy scenarios, with the best accuracy of 64.8% achieved by Gemini 3 Flash. To mitigate this limitation, we curate a 7K-sample instruction-tuning dataset and propose an Omni-Modal Future Forecasting (OFF) training strategy. Evaluations on FutureOmni and popular audio-visual and video-only benchmarks demonstrate that OFF enhances future forecasting and generalization. We publicly release all code (https://github.com/OpenMOSS/FutureOmni) and datasets (https://huggingface.co/datasets/OpenMOSS-Team/FutureOmni).
CET2: Modelling Topic Transitions for Coherent and Engaging Knowledge-Grounded Conversations
Knowledge-grounded dialogue systems aim to generate coherent and engaging responses based on the dialogue contexts and selected external knowledge. Previous knowledge selection methods tend to rely too heavily on the dialogue contexts or over-emphasize the new information in the selected knowledge, resulting in the selection of repetitious or incongruous knowledge and further generating repetitive or incoherent responses, as the generation of the response depends on the chosen knowledge. To address these shortcomings, we introduce a Coherent and Engaging Topic Transition (CET2) framework to model topic transitions for selecting knowledge that is coherent to the context of the conversations while providing adequate knowledge diversity for topic development. Our CET2 framework considers multiple factors for knowledge selection, including valid transition logic from dialogue contexts to the following topics and systematic comparisons between available knowledge candidates. Extensive experiments on two public benchmarks demonstrate the superiority and the better generalization ability of CET2 on knowledge selection. This is due to our well-designed transition features and comparative knowledge selection strategy, which are more transferable to conversations about unseen topics. Analysis of fine-grained knowledge selection accuracy also shows that CET2 can better balance topic entailment (contextual coherence) and development (knowledge diversity) in dialogue than existing approaches.
(\\mathcal{V}isi\\mathcal{P}runer\\): Decoding Discontinuous Cross-Modal Dynamics for Efficient Multimodal LLMs
Multimodal Large Language Models (MLLMs) have achieved strong performance across vision-language tasks, but suffer from significant computational overhead due to the quadratic growth of attention computations with the number of multimodal tokens. Though efforts have been made to prune tokens in MLLMs, \\textit{they lack a fundamental understanding of how MLLMs process and fuse multimodal information.} Through systematic analysis, we uncover a \\textbf{three-stage} cross-modal interaction process: (1) Shallow layers recognize task intent, with visual tokens acting as passive attention sinks; (2) Cross-modal fusion occurs abruptly in middle layers, driven by a few critical visual tokens; (3) Deep layers discard vision tokens, focusing solely on linguistic refinement. Based on these findings, we propose \\emph{VisiPruner}, a training-free pruning framework that reduces up to 99\\% of vision-related attention computations and 53.9\\% of FLOPs on LLaVA-v1.5 7B. It significantly outperforms existing token pruning methods and generalizes across diverse MLLMs. Beyond pruning, our insights further provide actionable guidelines for training efficient MLLMs by aligning model architecture with its intrinsic layer-wise processing dynamics. Our code is available at: https://github.com/EIT-NLP/VisiPruner.